C++ For Dummies Couch DB The Definitive Guide
User Manual: Pdf
Open the PDF directly: View PDF .
Page Count: 272
Download | |
Open PDF In Browser | View PDF |
free ebooks ==> www.ebook777.com WWW.EBOOK777.COM free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CouchDB: The Definitive Guide WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CouchDB: The Definitive Guide J. Chris Anderson, Jan Lehnardt, and Noah Slater Beijing • Cambridge • Farnham • Köln • Sebastopol • Taipei • Tokyo WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CouchDB: The Definitive Guide by J. Chris Anderson, Jan Lehnardt, and Noah Slater Copyright © 2010 J. Chris Anderson, Jan Lehnardt, and Noah Slater. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Editor: Mike Loukides Production Editor: Sarah Schneider Production Services: Appingo, Inc. Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano Printing History: January 2010: First Edition. O’Reilly and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. CouchDB: The Definitive Guide, the image of a Pomeranian dog, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. This work has been released under the Creative Commons Attribution License. To view a copy of this license, visit http://creativecommons.org/licenses/by/2.0/legalcode or send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA. TM This book uses RepKover™, a durable and flexible lay-flat binding. ISBN: 978-0-596-15589-6 [M] 1263584573 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com For the Web, and all the people who helped me along the way. Thank you. —J. Chris Für Marita und Kalle. —Jan For my parents, God and Damien Katz. —Noah WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Table of Contents Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Part I. Introduction 1. Why CouchDB? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Relax A Different Way to Model Your Data A Better Fit for Common Applications Self-Contained Data Syntax and Semantics Building Blocks for Larger Systems CouchDB Replication Local Data Is King Wrapping Up 3 4 5 5 6 6 8 8 9 2. Eventual Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Working with the Grain The CAP Theorem Local Consistency The Key to Your Data No Locking Validation Distributed Consistency Incremental Replication Case Study Wrapping Up 11 12 13 13 14 15 16 16 17 20 vii WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com 3. Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 All Systems Are Go! Welcome to Futon Your First Database and Document Running a Query Using MapReduce Triggering Replication Wrapping Up 21 23 24 27 31 32 4. The Core API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Server Databases Documents Revisions Documents in Detail Replication Wrapping Up 33 34 38 39 40 42 44 Part II. Developing with CouchDB 5. Design Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Document Modeling The Query Server Applications Are Documents A Basic Design Document Looking to the Future 47 48 48 51 52 6. Finding Your Data with Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 What Is a View? Efficient Lookups Find One Find Many Reversed Results The View to Get Comments for Posts Reduce/Rereduce Lessons Learned Wrapping Up 53 56 56 57 58 59 61 64 64 7. Validation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Document Validation Functions Validation’s Context Writing One Type viii | Table of Contents WWW.EBOOK777.COM 67 69 69 69 www.it-ebooks.info free ebooks ==> www.ebook777.com Required Fields Timestamps Authorship Wrapping Up 71 72 73 73 8. Show Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 The Show Function API Side Effect–Free Design Documents Querying Show Functions Design Document Resources Query Parameters Accept Headers Etags Functions and Templates The !json Macro The !code Macro Learning Shows Using Templates Writing Templates 76 77 78 78 79 79 80 81 81 82 82 83 83 85 9. Transforming Views with List Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Arguments to the List Function An Example List Function List Theory Querying Lists Lists, Etags, and Caching 87 89 91 92 93 Part III. Example Application 10. Standalone Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Use the Correct Version Portable JavaScript Applications Are Documents Standalone In the Wild Wrapping Up 97 98 99 100 101 108 11. Managing Design Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Working with the Example Application Installing CouchApp Using CouchApp 109 110 110 Table of Contents | ix WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Download the Sofa Source Code CouchApp Clone ZIP and TAR Files Join the Sofa Development Community on GitHub The Sofa Source Tree Deploying Sofa Pushing Sofa to Your CouchDB Visit the Application Set Up Your Admin Account Deploying to a Secure CouchDB Configuring CouchApp with .couchapprc 111 111 111 112 112 115 115 115 116 117 117 12. Storing Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 JSON Document Format Beyond _id and _rev: Your Document Data The Edit Page The HTML Scaffold Saving a Document Validation Save Your First Post Wrapping Up 120 122 123 124 125 128 130 130 13. Showing Documents in Custom Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Rendering Documents with Show Functions The Post Page Template Dynamic Dates 132 133 134 14. Viewing Lists of Blog Posts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Map of Recent Blog Posts Rendering the View as HTML Using a List Function Sofa’s List Function The Final Result 135 137 137 141 Part IV. Deploying CouchDB 15. Scaling Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Scaling Read Requests Scaling Write Requests Scaling Data Basics First x | Table of Contents WWW.EBOOK777.COM 146 146 147 147 www.it-ebooks.info free ebooks ==> www.ebook777.com 16. Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 The Magic Simple Replication with the Admin Interface Replication in Detail Continuous Replication That’s It? 150 150 151 152 152 17. Conflict Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 The Split Brain Conflict Resolution by Example Working with Conflicts Deterministic Revision IDs Wrapping Up 154 155 158 161 161 18. Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Having a Backup 163 19. Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Introducing CouchDB Lounge Consistent Hashing Redundant Storage Redundant Proxies View Merging Growing the Cluster Moving Partitions Splitting Partitions 165 166 167 167 167 168 169 170 Part V. Reference 20. Change Notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Polling for Changes Long Polling Continuous Changes Filters Wrapping Up 174 175 176 177 178 21. View Cookbook for SQL Jockeys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Using Views Defining a View Querying a View MapReduce Functions Look Up by Key 179 179 180 180 181 Table of Contents | xi WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Look Up by Prefix Aggregate Functions Get Unique Values Enforcing Uniqueness 182 183 185 187 22. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 The Admin Party Creating New Admin Users Hashing Passwords Basic Authentication Update Validations Again Cookie Authentication Network Server Security 189 190 191 191 192 193 194 23. High Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Good Benchmarks Are Non-Trivial High Performance CouchDB Hardware An Implementation Note Bulk Inserts and Mostly Monotonic DocIDs Optimized Examples: Views and Replication Bulk Document Inserts Batch Mode Single Document Inserts Hovercraft Trade-Offs But…My Boss Wants Numbers! A Call to Arms 195 197 197 197 198 198 198 199 200 201 201 202 202 24. Recipes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Banking Accountants Don’t Use Erasers Wrapping Up Ordering Lists A List of Integers A List of Floats Pagination Example Data A View Setup Slow Paging (Do Not Use) Fast Paging (Do Use) Jump to Page xii | Table of Contents WWW.EBOOK777.COM 205 205 208 208 208 210 211 211 212 213 213 215 216 www.it-ebooks.info free ebooks ==> www.ebook777.com Part VI. Appendixes A. Installing on Unix-like Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 B. Installing on Mac OS X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 C. Installing on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 D. Installing from Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 E. JSON Primer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 F. The Power of B-trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Table of Contents | xiii WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Foreword As the creator of CouchDB, it gives me great pleasure to write this Foreword. This book has been a long time coming. I’ve worked on CouchDB since 2005, when it was only a vision in my head and only my wife Laura believed I could make it happen. Now the project has taken on a life of its own, and code is literally running on millions of machines. I couldn’t stop it now if I tried. A great analogy J. Chris uses is that CouchDB has felt like a boulder we’ve been pushing up a hill. Over time, it’s been moving faster and getting easier to push, and now it’s moving so fast it’s starting to feel like it could get loose and crush some unlucky villagers. Or something. Hey, remember “Tales of the Runaway Boulder” with Robert Wagner on Saturday Night Live? Good times. Well, now we are trying to safely guide that boulder. Because of the villagers. You know what? This boulder analogy just isn’t working. Let’s move on. The reason for this book is that CouchDB is a very different way of approaching data storage. A way that isn’t inherently better or worse than the ways before—it’s just another tool, another way of thinking about things. It’s missing some features you might be used to, but it’s gained some abilities you’ve maybe never seen. Sometimes it’s an excellent fit for your problems; sometimes it’s terrible. And sometimes you may be thinking about your problems all wrong. You just need to approach them from a different angle. Hopefully this book will help you understand CouchDB and the approach that it takes, and also understand how and when it can be used for the problems you face. Otherwise, someday it could become a runaway boulder, being misused and causing disasters that could have been avoided. And I’ll be doing my best Charlton Heston imitation, on the ground, pounding the dirt, yelling, “You maniacs! You blew it up! Ah, damn you! God damn you all to hell!” Or something like that. —Damien Katz Creator of CouchDB xv WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Preface Thanks for purchasing this book! If it was a gift, then congratulations. If, on the other hand, you downloaded it without paying, well, actually, we’re pretty happy about that too! This book is available under a free license, and that’s important because we want it to serve the community as documentation—and documentation should be free. So, why pay for a free book? Well, you might like the warm fuzzy feeling you get from holding a book in your hands, as you cosy up on the couch with a cup of coffee. On the couch...get it? Bad jokes aside, whatever your reasons, buying the book helps support us, so we have more time to work on improvements for both the book and CouchDB. So thank you! We set out to compile the best and most comprehensive collection of CouchDB information there is, and yet we know we failed. CouchDB is a fast-moving target and grew significantly during the time we were writing the book. We were able to adapt quickly and keep things up-to-date, but we also had to draw the line somewhere if we ever hoped to publish it. At the time of this writing, CouchDB 0.10.1 is the latest release, but you might already be seeing 0.10.2 or even 0.11.0 released or being prepared—maybe even 1.0. Although we have some ideas about how future releases will look, we don’t know for certain and didn’t want to make any wild guesses. CouchDB is a community project, so ultimately it’s up to you, our readers, to help shape the project. On the plus side, many people successfully run CouchDB 0.10 in production, and you will have more than enough on your hands to run a solid project. Future releases of CouchDB will make things easier in places, but the core features should remain the same. Besides, learning the core features helps you understand and appreciate the shortcuts and allows you to roll your own hand-tailored solutions. Writing an open book was great fun. We’re happy O’Reilly supported our decision in every way possible. The best part—besides giving the CouchDB community early access to the material—was the commenting functionality we implemented on the book’s website. It allows anybody to comment on any paragraph in the book with a simple click. We used some simple JavaScript and Google Groups to allow painless commenting. The result was astounding. As of today, 866 people have sent more than 1,100 xvii WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com messages to our little group. Submissions have ranged from pointing out small typos to deep technical discussions. Feedback on our original first chapter led us to a complete rewrite in order to make sure the points we wanted to get across did, indeed, get across. This system allowed us to clearly formulate what we wanted to say in a way that worked for you, our readers. Overall, the book has become so much better because of the help of hundreds of volunteers who took the time to send in their suggestions. We understand the immense value this model has, and we want to keep it up. New features in CouchDB should make it into the book without us necessarily having to do a reprint every thee months. The publishing industry is not ready for that yet, but we want to continue to release new and revised content and listen closely to the feedback. The specifics of how we’ll do this are still in flux, but we’ll be posting the information to the book’s website the first moment we know it. That’s a promise! So make sure to visit the book’s website at http://books.couchdb.org/relax to keep up-to-date. Before we let you dive into the book, we want to make sure you’re well prepared. CouchDB is written in Erlang, but you don’t need to know anything about Erlang to use CouchDB. CouchDB also heavily relies on web technologies like HTTP and JavaScript, and some experience with those does help when following the examples throughout the book. If you have built a website before—simple or complex—you should be ready to go. If you are an experienced developer or systems architect, the introduction to CouchDB should be comforting, as you already know everything involved—all you need to learn are the ways CouchDB puts them together. Toward the end of the book, we ramp up the experience level to help you get as comfortable building large-scale CouchDB systems as you are with personal projects. If you are a beginning web developer, don’t worry—by the time you get to the later parts of the book, you should be able to follow along with the harder stuff. Now, sit back, relax, and enjoy the ride through the wonderful world of CouchDB. Using Code Examples This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. xviii | Preface WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com This work is licensed under the Creative Commons Attribution License. To view a copy of this license, visit http://creativecommons.org/licenses/by/2.0/legalcode or send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA. An attribution usually includes the title, author, publisher, and ISBN. For example: “CouchDB: The Definitive Guide by J. Chris Anderson, Jan Lehnardt, and Noah Slater. Copyright 2010 J. Chris Anderson, Jan Lehnardt, and Noah Slater, 978-0-596-15589-6.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values determined by context. This icon signifies a tip, suggestion, or general note. This icon indicates a warning or caution. Safari® Books Online Safari Books Online is an on-demand digital library that lets you easily search over 7,500 technology and creative reference books and videos to find the answers you need quickly. Preface | xix WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com With a subscription, you can read any page and watch any video from our library online. Read books on your cell phone and mobile devices. Access new titles before they are available for print, and get exclusive access to manuscripts in development and post feedback for the authors. Copy and paste code samples, organize your favorites, download chapters, bookmark key sections, create notes, print out pages, and benefit from tons of other time-saving features. O’Reilly Media has uploaded this book to the Safari Books Online service. To have full digital access to this book and others on similar topics from O’Reilly and other publishers, sign up for free at http://my.safaribooksonline.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at: http://www.oreilly.com/catalog/9780596155896 To comment or ask technical questions about this book, send email to: bookquestions@oreilly.com For more information about our books, conferences, Resource Centers, and the O’Reilly Network, see our website at: http://www.oreilly.com Acknowledgments J. Chris I would like to acknowledge all the committers of CouchDB, the people sending patches, and the rest of the community. I couldn’t have done it without my wife, Amy, who helps me think about the big picture; without the patience and support of my coauthors and O’Reilly; nor without the help of everyone who helped us hammer out book content details on the mailing lists. And a shout-out to the copyeditor, who was awesome! xx | Preface WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Jan I would like to thank the CouchDB community. Special thanks go out to a number of nice people all over the place who invited me to attend or talk at a conference, who let me sleep on their couches (pun most definitely intended), and who made sure I had a good time when I was abroad presenting CouchDB. There are too many to name, but all of you in Dublin, Portland, Lisbon, London, Zurich, San Francisco, Mountain View, Dortmund, Stockholm, Hamburg, Frankfurt, Salt Lake City, Blacksburg, San Diego, and Amsterdam: you know who you are—thanks! To my family, friends, and coworkers: thanks you for your support and your patience with me over the last year. You won’t hear, “I’ve got to leave early, I have a book to write” from me anytime soon, promise! Anna, you believe in me; I couldn’t have done this without you. Noah I would like to thank O’Reilly for their enthusiasm in CouchDB and for realizing the importance of free documentation. And of course, I’d like to thank Jan and J. Chris for being so great to work with. But a special thanks goes out to the whole CouchDB community, for making everything so fun and rewarding. Without you guys, none of this would be possible. And if you’re reading this, that means you! Preface | xxi WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com PART I Introduction WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 1 Why CouchDB? Download at WoweBook.com Apache CouchDB is one of a new breed of database management systems. This chapter explains why there’s a need for new systems as well as the motivations behind building CouchDB. As CouchDB developers, we’re naturally very excited to be using CouchDB. In this chapter we’ll share with you the reasons for our enthusiasm. We’ll show you how CouchDB’s schema-free document model is a better fit for common applications, how the built-in query engine is a powerful way to use and process your data, and how CouchDB’s design lends itself to modularization and scalability. Relax If there’s one word to describe CouchDB, it is relax. It is in the title of this book, it is the byline to CouchDB’s official logo, and when you start CouchDB, you see: Apache CouchDB has started. Time to relax. Why is relaxation important? Developer productivity roughly doubled in the last five years. The chief reason for the boost is more powerful tools that are easier to use. Take Ruby on Rails as an example. It is an infinitely complex framework, but it’s easy to get started with. Rails is a success story because of the core design focus on ease of use. This is one reason why CouchDB is relaxing: learning CouchDB and understanding its core concepts should feel natural to most everybody who has been doing any work on the Web. And it is still pretty easy to explain to non-technical people. Getting out of the way when creative people try to build specialized solutions is in itself a core feature and one thing that CouchDB aims to get right. We found existing tools too cumbersome to work with during development or in production, and decided to focus on making CouchDB easy, even a pleasure, to use. Chapters 3 and 4 will demonstrate the intuitive HTTP-based REST API. Another area of relaxation for CouchDB users is the production setting. If you have a live running application, CouchDB again goes out of its way to avoid troubling you. 3 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Its internal architecture is fault-tolerant, and failures occur in a controlled environment and are dealt with gracefully. Single problems do not cascade through an entire server system but stay isolated in single requests. CouchDB’s core concepts are simple (yet powerful) and well understood. Operations teams (if you have a team; otherwise, that’s you) do not have to fear random behavior and untraceable errors. If anything should go wrong, you can easily find out what the problem is—but these situations are rare. CouchDB is also designed to handle varying traffic gracefully. For instance, if a website is experiencing a sudden spike in traffic, CouchDB will generally absorb a lot of concurrent requests without falling over. It may take a little more time for each request, but they all get answered. When the spike is over, CouchDB will work with regular speed again. The third area of relaxation is growing and shrinking the underlying hardware of your application. This is commonly referred to as scaling. CouchDB enforces a set of limits on the programmer. On first look, CouchDB might seem inflexible, but some features are left out by design for the simple reason that if CouchDB supported them, it would allow a programmer to create applications that couldn’t deal with scaling up or down. We’ll explore the whole matter of scaling CouchDB in Part IV, Deploying CouchDB. In a nutshell: CouchDB doesn’t let you do things that would get you in trouble later on. This sometimes means you’ll have to unlearn best practices you might have picked up in your current or past work. Chapter 24 contains a list of common tasks and how to solve them in CouchDB. A Different Way to Model Your Data We believe that CouchDB will drastically change the way you build document-based applications. CouchDB combines an intuitive document storage model with a powerful query engine in a way that’s so simple you’ll probably be tempted to ask, “Why has no one built something like this before?” Django may be built for the Web, but CouchDB is built of the Web. I’ve never seen software that so completely embraces the philosophies behind HTTP. CouchDB makes Django look old-school in the same way that Django makes ASP look outdated. —Jacob Kaplan-Moss, Django developer CouchDB’s design borrows heavily from web architecture and the concepts of resources, methods, and representations. It augments this with powerful ways to query, map, combine, and filter your data. Add fault tolerance, extreme scalability, and incremental replication, and CouchDB defines a sweet spot for document databases. 4 | Chapter 1: Why CouchDB? WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com A Better Fit for Common Applications We write software to improve our lives and the lives of others. Usually this involves taking some mundane information—such as contacts, invoices, or receipts—and manipulating it using a computer application. CouchDB is a great fit for common applications like this because it embraces the natural idea of evolving, self-contained documents as the very core of its data model. Self-Contained Data An invoice contains all the pertinent information about a single transaction—the seller, the buyer, the date, and a list of the items or services sold. As shown in Figure 1-1, there’s no abstract reference on this piece of paper that points to some other piece of paper with the seller’s name and address. Accountants appreciate the simplicity of having everything in one place. And given the choice, programmers appreciate that, too. Figure 1-1. Self-contained documents Yet using references is exactly how we model our data in a relational database! Each invoice is stored in a table as a row that refers to other rows in other tables—one row for seller information, one for the buyer, one row for each item billed, and more rows still to describe the item details, manufacturer details, and so on and so forth. This isn’t meant as a detraction of the relational model, which is widely applicable and extremely useful for a number of reasons. Hopefully, though, it illustrates the point that sometimes your model may not “fit” your data in the way it occurs in the real world. Let’s take a look at the humble contact database to illustrate a different way of modeling data, one that more closely “fits” its real-world counterpart—a pile of business cards. Much like our invoice example, a business card contains all the important information, right there on the cardstock. We call this “self-contained” data, and it’s an important concept in understanding document databases like CouchDB. A Better Fit for Common Applications | 5 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Syntax and Semantics Most business cards contain roughly the same information—someone’s identity, an affiliation, and some contact information. While the exact form of this information can vary between business cards, the general information being conveyed remains the same, and we’re easily able to recognize it as a business card. In this sense, we can describe a business card as a real-world document. Jan’s business card might contain a phone number but no fax number, whereas J. Chris’s business card contains both a phone and a fax number. Jan does not have to make his lack of a fax machine explicit by writing something as ridiculous as “Fax: None” on the business card. Instead, simply omitting a fax number implies that he doesn’t have one. We can see that real-world documents of the same type, such as business cards, tend to be very similar in semantics—the sort of information they carry—but can vary hugely in syntax, or how that information is structured. As human beings, we’re naturally comfortable dealing with this kind of variation. While a traditional relational database requires you to model your data up front, CouchDB’s schema-free design unburdens you with a powerful way to aggregate your data after the fact, just like we do with real-world documents. We’ll look in depth at how to design applications with this underlying storage paradigm. Building Blocks for Larger Systems CouchDB is a storage system useful on its own. You can build many applications with the tools CouchDB gives you. But CouchDB is designed with a bigger picture in mind. Its components can be used as building blocks that solve storage problems in slightly different ways for larger and more complex systems. Whether you need a system that’s crazy fast but isn’t too concerned with reliability (think logging), or one that guarantees storage in two or more physically separated locations for reliability, but you’re willing to take a performance hit, CouchDB lets you build these systems. There are a multitude of knobs you could turn to make a system work better in one area, but you’ll affect another area when doing so. One example would be the CAP theorem discussed in the next chapter. To give you an idea of other things that affect storage systems, see Figures 1-2 and 1-3. By reducing latency for a given system (and that is true not only for storage systems), you affect concurrency and throughput capabilities. 6 | Chapter 1: Why CouchDB? WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 1-2. Throughput, latency, or concurrency Figure 1-3. Scaling: read requests, write requests, or data When you want to scale out, there are three distinct issues to deal with: scaling read requests, write requests, and data. Orthogonal to all three and to the items shown in Figures 1-2 and 1-3 are many more attributes like reliability or simplicity. You can draw many of these graphs that show how different features or attributes pull into different directions and thus shape the system they describe. CouchDB is very flexible and gives you enough building blocks to create a system shaped to suit your exact problem. That’s not saying that CouchDB can be bent to solve any problem—CouchDB is no silver bullet—but in the area of data storage, it can get you a long way. Building Blocks for Larger Systems | 7 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CouchDB Replication CouchDB replication is one of these building blocks. Its fundamental function is to synchronize two or more CouchDB databases. This may sound simple, but the simplicity is key to allowing replication to solve a number of problems: reliably synchronize databases between multiple machines for redundant data storage; distribute data to a cluster of CouchDB instances that share a subset of the total number of requests that hit the cluster (load balancing); and distribute data between physically distant locations, such as one office in New York and another in Tokyo. CouchDB replication uses the same REST API all clients use. HTTP is ubiquitous and well understood. Replication works incrementally; that is, if during replication anything goes wrong, like dropping your network connection, it will pick up where it left off the next time it runs. It also only transfers data that is needed to synchronize databases. A core assumption CouchDB makes is that things can go wrong, like network connection troubles, and it is designed for graceful error recovery instead of assuming all will be well. The replication system’s incremental design shows that best. The ideas behind “things that can go wrong” are embodied in the Fallacies of Distributed Computing:* 1. 2. 3. 4. 5. 6. 7. 8. The network is reliable. Latency is zero. Bandwidth is infinite. The network is secure. Topology doesn’t change. There is one administrator. Transport cost is zero. The network is homogeneous. Existing tools often try to hide the fact that there is a network and that any or all of the previous conditions don’t exist for a particular system. This usually results in fatal error scenarios when something finally goes wrong. In contrast, CouchDB doesn’t try to hide the network; it just handles errors gracefully and lets you know when actions on your end are required. Local Data Is King CouchDB takes quite a few lessons learned from the Web, but there is one thing that could be improved about the Web: latency. Whenever you have to wait for an application to respond or a website to render, you almost always wait for a network con- * http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing 8 | Chapter 1: Why CouchDB? WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com nection that isn’t as fast as you want it at that point. Waiting a few seconds instead of milliseconds greatly affects user experience and thus user satisfaction. What do you do when you are offline? This happens all the time—your DSL or cable provider has issues, or your iPhone, G1, or Blackberry has no bars, and no connectivity means no way to get to your data. CouchDB can solve this scenario as well, and this is where scaling is important again. This time it is scaling down. Imagine CouchDB installed on phones and other mobile devices that can synchronize data with centrally hosted CouchDBs when they are on a network. The synchronization is not bound by user interface constraints like subsecond response times. It is easier to tune for high bandwidth and higher latency than for low bandwidth and very low latency. Mobile applications can then use the local CouchDB to fetch data, and since no remote networking is required for that, latency is low by default. Can you really use CouchDB on a phone? Erlang, CouchDB’s implementation language has been designed to run on embedded devices magnitudes smaller and less powerful than today’s phones. Wrapping Up The next chapter further explores the distributed nature of CouchDB. We should have given you enough bites to whet your interest. Let’s go! Wrapping Up | 9 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 2 Eventual Consistency In the previous chapter, we saw that CouchDB’s flexibility allows us to evolve our data as our applications grow and change. In this chapter, we’ll explore how working “with the grain” of CouchDB promotes simplicity in our applications and helps us naturally build scalable, distributed systems. Working with the Grain A distributed system is a system that operates robustly over a wide network. A particular feature of network computing is that network links can potentially disappear, and there are plenty of strategies for managing this type of network segmentation. CouchDB differs from others by accepting eventual consistency, as opposed to putting absolute consistency ahead of raw availability, like RDBMS or Paxos. What these systems have in common is an awareness that data acts differently when many people are accessing it simultaneously. Their approaches differ when it comes to which aspects of consistency, availability, or partition tolerance they prioritize. Engineering distributed systems is tricky. Many of the caveats and “gotchas” you will face over time aren’t immediately obvious. We don’t have all the solutions, and CouchDB isn’t a panacea, but when you work with CouchDB’s grain rather than against it, the path of least resistance leads you to naturally scalable applications. Of course, building a distributed system is only the beginning. A website with a database that is available only half the time is next to worthless. Unfortunately, the traditional relational database approach to consistency makes it very easy for application programmers to rely on global state, global clocks, and other high availability no-nos, without even realizing that they’re doing so. Before examining how CouchDB promotes scalability, we’ll look at the constraints faced by a distributed system. After we’ve seen the problems that arise when parts of your application can’t rely on being in constant contact with each other, we’ll see that CouchDB provides an intuitive and useful way for modeling applications around high availability. 11 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com The CAP Theorem The CAP theorem describes a few different strategies for distributing application logic across networks. CouchDB’s solution uses replication to propagate application changes across participating nodes. This is a fundamentally different approach from consensus algorithms and relational databases, which operate at different intersections of consistency, availability, and partition tolerance. The CAP theorem, shown in Figure 2-1, identifies three distinct concerns: Consistency All database clients see the same data, even with concurrent updates. Availability All database clients are able to access some version of the data. Partition tolerance The database can be split over multiple servers. Pick two. Figure 2-1. The CAP theorem When a system grows large enough that a single database node is unable to handle the load placed on it, a sensible solution is to add more servers. When we add nodes, we have to start thinking about how to partition data between them. Do we have a few databases that share exactly the same data? Do we put different sets of data on different database servers? Do we let only certain database servers write data and let others handle the reads? 12 | Chapter 2: Eventual Consistency WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Regardless of which approach we take, the one problem we’ll keep bumping into is that of keeping all these database servers in synchronization. If you write some information to one node, how are you going to make sure that a read request to another database server reflects this newest information? These events might be milliseconds apart. Even with a modest collection of database servers, this problem can become extremely complex. When it’s absolutely critical that all clients see a consistent view of the database, the users of one node will have to wait for any other nodes to come into agreement before being able to read or write to the database. In this instance, we see that availability takes a backseat to consistency. However, there are situations where availability trumps consistency: Each node in a system should be able to make decisions purely based on local state. If you need to do something under high load with failures occurring and you need to reach agreement, you’re lost. If you’re concerned about scalability, any algorithm that forces you to run agreement will eventually become your bottleneck. Take that as a given. —Werner Vogels, Amazon CTO and Vice President If availability is a priority, we can let clients write data to one node of the database without waiting for other nodes to come into agreement. If the database knows how to take care of reconciling these operations between nodes, we achieve a sort of “eventual consistency” in exchange for high availability. This is a surprisingly applicable trade-off for many applications. Unlike traditional relational databases, where each action performed is necessarily subject to database-wide consistency checks, CouchDB makes it really simple to build applications that sacrifice immediate consistency for the huge performance improvements that come with simple distribution. Local Consistency Before we attempt to understand how CouchDB operates in a cluster, it’s important that we understand the inner workings of a single CouchDB node. The CouchDB API is designed to provide a convenient but thin wrapper around the database core. By taking a closer look at the structure of the database core, we’ll have a better understanding of the API that surrounds it. The Key to Your Data At the heart of CouchDB is a powerful B-tree storage engine. A B-tree is a sorted data structure that allows for searches, insertions, and deletions in logarithmic time. As Figure 2-2 illustrates, CouchDB uses this B-tree storage engine for all internal data, documents, and views. If we understand one, we will understand them all. Local Consistency | 13 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 2-2. Anatomy of a view request CouchDB uses MapReduce to compute the results of a view. MapReduce makes use of two functions, “map” and “reduce,” which are applied to each document in isolation. Being able to isolate these operations means that view computation lends itself to parallel and incremental computation. More important, because these functions produce key/value pairs, CouchDB is able to insert them into the B-tree storage engine, sorted by key. Lookups by key, or key range, are extremely efficient operations with a B-tree, described in big O notation as O(log N) and O(log N + K), respectively. In CouchDB, we access documents and view results by key or key range. This is a direct mapping to the underlying operations performed on CouchDB’s B-tree storage engine. Along with document inserts and updates, this direct mapping is the reason we describe CouchDB’s API as being a thin wrapper around the database core. Being able to access results by key alone is a very important restriction because it allows us to make huge performance gains. As well as the massive speed improvements, we can partition our data over multiple nodes, without affecting our ability to query each node in isolation. BigTable, Hadoop, SimpleDB, and memcached restrict object lookups by key for exactly these reasons. No Locking A table in a relational database is a single data structure. If you want to modify a table— say, update a row—the database system must ensure that nobody else is trying to update that row and that nobody can read from that row while it is being updated. The 14 | Chapter 2: Eventual Consistency WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com common way to handle this uses what’s known as a lock. If multiple clients want to access a table, the first client gets the lock, making everybody else wait. When the first client’s request is processed, the next client is given access while everybody else waits, and so on. This serial execution of requests, even when they arrived in parallel, wastes a significant amount of your server’s processing power. Under high load, a relational database can spend more time figuring out who is allowed to do what, and in which order, than it does doing any actual work. Instead of locks, CouchDB uses Multi-Version Concurrency Control (MVCC) to manage concurrent access to the database. Figure 2-3 illustrates the differences between MVCC and traditional locking mechanisms. MVCC means that CouchDB can run at full speed, all the time, even under high load. Requests are run in parallel, making excellent use of every last drop of processing power your server has to offer. Figure 2-3. MVCC means no locking Documents in CouchDB are versioned, much like they would be in a regular version control system such as Subversion. If you want to change a value in a document, you create an entire new version of that document and save it over the old one. After doing this, you end up with two versions of the same document, one old and one new. How does this offer an improvement over locks? Consider a set of requests wanting to access a document. The first request reads the document. While this is being processed, a second request changes the document. Since the second request includes a completely new version of the document, CouchDB can simply append it to the database without having to wait for the read request to finish. When a third request wants to read the same document, CouchDB will point it to the new version that has just been written. During this whole process, the first request could still be reading the original version. A read request will always see the most recent snapshot of your database. Validation As application developers, we have to think about what sort of input we should accept and what we should reject. The expressive power to do this type of validation over Local Consistency | 15 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com complex data within a traditional relational database leaves a lot to be desired. Fortunately, CouchDB provides a powerful way to perform per-document validation from within the database. CouchDB can validate documents using JavaScript functions similar to those used for MapReduce. Each time you try to modify a document, CouchDB will pass the validation function a copy of the existing document, a copy of the new document, and a collection of additional information, such as user authentication details. The validation function now has the opportunity to approve or deny the update. By working with the grain and letting CouchDB do this for us, we save ourselves a tremendous amount of CPU cycles that would otherwise have been spent serializing object graphs from SQL, converting them into domain objects, and using those objects to do application-level validation. Distributed Consistency Maintaining consistency within a single database node is relatively easy for most databases. The real problems start to surface when you try to maintain consistency between multiple database servers. If a client makes a write operation on server A, how do we make sure that this is consistent with server B, or C, or D? For relational databases, this is a very complex problem with entire books devoted to its solution. You could use multi-master, master/slave, partitioning, sharding, write-through caches, and all sorts of other complex techniques. Incremental Replication Because CouchDB operations take place within the context of a single document, if you want to use two database nodes, you no longer have to worry about them staying in constant communication. CouchDB achieves eventual consistency between databases by using incremental replication, a process where document changes are periodically copied between servers. We are able to build what’s known as a shared nothing cluster of databases where each node is independent and self-sufficient, leaving no single point of contention across the system. Need to scale out your CouchDB database cluster? Just throw in another server. As illustrated in Figure 2-4, with CouchDB’s incremental replication, you can synchronize your data between any two databases however you like and whenever you like. After replication, each database is able to work independently. You could use this feature to synchronize database servers within a cluster or between data centers using a job scheduler such as cron, or you could use it to synchronize data with your laptop for offline work as you travel. Each database can be used in the usual fashion, and changes between databases can be synchronized later in both directions. 16 | Chapter 2: Eventual Consistency WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 2-4. Incremental replication between CouchDB nodes What happens when you change the same document in two different databases and want to synchronize these with each other? CouchDB’s replication system comes with automatic conflict detection and resolution. When CouchDB detects that a document has been changed in both databases, it flags this document as being in conflict, much like they would be in a regular version control system. This isn’t as troublesome as it might first sound. When two versions of a document conflict during replication, the winning version is saved as the most recent version in the document’s history. Instead of throwing the losing version away, as you might expect, CouchDB saves this as a previous version in the document’s history, so that you can access it if you need to. This happens automatically and consistently, so both databases will make exactly the same choice. It is up to you to handle conflicts in a way that makes sense for your application. You can leave the chosen document versions in place, revert to the older version, or try to merge the two versions and save the result. Case Study Greg Borenstein, a friend and coworker, built a small library for converting Songbird playlists to JSON objects and decided to store these in CouchDB as part of a backup application. The completed software uses CouchDB’s MVCC and document revisions to ensure that Songbird playlists are backed up robustly between nodes. Songbird is a free software media player with an integrated web browser, based on the Mozilla XULRunner platform. Songbird is available for Microsoft Windows, Apple Mac OS X, Solaris, and Linux. Distributed Consistency | 17 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Let’s examine the workflow of the Songbird backup application, first as a user backing up from a single computer, and then using Songbird to synchronize playlists between multiple computers. We’ll see how document revisions turn what could have been a hairy problem into something that just works. The first time we use this backup application, we feed our playlists to the application and initiate a backup. Each playlist is converted to a JSON object and handed to a CouchDB database. As illustrated in Figure 2-5, CouchDB hands back the document ID and revision of each playlist as it’s saved to the database. Figure 2-5. Backing up to a single database After a few days, we find that our playlists have been updated and we want to back up our changes. After we have fed our playlists to the backup application, it fetches the latest versions from CouchDB, along with the corresponding document revisions. When the application hands back the new playlist document, CouchDB requires that the document revision is included in the request. CouchDB then makes sure that the document revision handed to it in the request matches the current revision held in the database. Because CouchDB updates the revision with every modification, if these two are out of synchronization it suggests that someone else has made changes to the document between the time we requested it from the database and the time we sent our updates. Making changes to a document after someone else has modified it without first inspecting those changes is usually a bad idea. Forcing clients to hand back the correct document revision is the heart of CouchDB’s optimistic concurrency. We have a laptop we want to keep synchronized with our desktop computer. With all our playlists on our desktop, the first step is to “restore from backup” onto our laptop. This is the first time we’ve done this, so afterward our laptop should hold an exact replica of our desktop playlist collection. 18 | Chapter 2: Eventual Consistency WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com After editing our Argentine Tango playlist on our laptop to add a few new songs we’ve purchased, we want to save our changes. The backup application replaces the playlist document in our laptop CouchDB database and a new document revision is generated. A few days later, we remember our new songs and want to copy the playlist across to our desktop computer. As illustrated in Figure 2-6, the backup application copies the new document and the new revision to the desktop CouchDB database. Both CouchDB databases now have the same document revision. Figure 2-6. Synchronizing between two databases Because CouchDB tracks document revisions, it ensures that updates like these will work only if they are based on current information. If we had made modifications to the playlist backups between synchronization, things wouldn’t go as smoothly. We back up some changes on our laptop and forget to synchronize. A few days later, we’re editing playlists on our desktop computer, make a backup, and want to synchronize this to our laptop. As illustrated in Figure 2-7, when our backup application tries to replicate between the two databases, CouchDB sees that the changes being sent from our desktop computer are modifications of out-of-date documents and helpfully informs us that there has been a conflict. Recovering from this error is easy to accomplish from an application perspective. Just download CouchDB’s version of the playlist and provide an opportunity to merge the changes or save local modifications into a new playlist. Distributed Consistency | 19 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 2-7. Synchronization conflicts between two databases Wrapping Up CouchDB’s design borrows heavily from web architecture and the lessons learned deploying massively distributed systems on that architecture. By understanding why this architecture works the way it does, and by learning to spot which parts of your application can be easily distributed and which parts cannot, you’ll enhance your ability to design distributed and scalable applications, with CouchDB or without it. We’ve covered the main issues surrounding CouchDB’s consistency model and hinted at some of the benefits to be had when you work with CouchDB and not against it. But enough theory—let’s get up and running and see what all the fuss is about! 20 | Chapter 2: Eventual Consistency WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 3 Getting Started In this chapter, we’ll take a quick tour of CouchDB’s features, familiarizing ourselves with Futon, the built-in administration interface. We’ll create our first document and experiment with CouchDB views. Before we start, skip to Appendix D and look for your operating system. You will need to follow those instructions and get CouchDB installed before you can progress. All Systems Are Go! We’ll have a very quick look at CouchDB’s bare-bones Application Programming Interface (API) by using the command-line utility curl. Please note that this is only one way of talking to CouchDB. We will show you plenty more throughout the rest of the book. What’s interesting about curl is that it gives you control over raw HTTP requests, and you can see exactly what is going on “underneath the hood” of your database. Make sure CouchDB is still running, and then do: curl http://127.0.0.1:5984/ This issues a GET request to your newly installed CouchDB instance. The reply should look something like: {"couchdb":"Welcome","version":"0.10.1"} Not all that spectacular. CouchDB is saying “hello” with the running version number. Next, we can get a list of databases: curl -X GET http://127.0.0.1:5984/_all_dbs All we added to the previous request is the _all_dbs string. The response should look like: [] Oh, that’s right, we didn’t create any databases yet! All we see is an empty list. 21 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com The curl command issues GET requests by default. You can issue POST requests using curl -X POST. To make it easy to work with our terminal history, we usually use the -X option even when issuing GET requests. If we want to send a POST next time, all we have to change is the method. HTTP does a bit more under the hood than you can see in the examples here. If you’re interested in every last detail that goes over the wire, pass in the -v option (e.g., curl -vX GET), which will show you the server curl tries to connect to, the request headers it sends, and response headers it receives back. Great for debugging! Let’s create a database: curl -X PUT http://127.0.0.1:5984/baseball CouchDB will reply with: {"ok":true} Retrieving the list of databases again shows some useful results this time: curl -X GET http://127.0.0.1:5984/_all_dbs ["baseball"] We should mention JavaScript Object Notation (JSON) here, the data format CouchDB speaks. JSON is a lightweight data interchange format based on JavaScript syntax. Because JSON is natively compatible with JavaScript, your web browser is an ideal client for CouchDB. Brackets ([]) represent ordered lists, and curly braces ({}) represent key/ value dictionaries. Keys must be strings, delimited by quotes ("), and values can be strings, numbers, booleans, lists, or key/value dictionaries. For a more detailed description of JSON, see Appendix E. Let’s create another database: curl -X PUT http://127.0.0.1:5984/baseball CouchDB will reply with: {"error":"file_exists","reason":"The database could not be created, the file already exists."} We already have a database with that name, so CouchDB will respond with an error. Let’s try again with a different database name: curl -X PUT http://127.0.0.1:5984/plankton CouchDB will reply with: {"ok":true} Retrieving the list of databases yet again shows some useful results: 22 | Chapter 3: Getting Started WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com curl -X GET http://127.0.0.1:5984/_all_dbs CouchDB will respond with: ["baseball", "plankton"] To round things off, let’s delete the second database: curl -X DELETE http://127.0.0.1:5984/plankton CouchDB will reply with: {"ok":true} The list of databases is now the same as it was before: curl -X GET http://127.0.0.1:5984/_all_dbs CouchDB will respond with: ["baseball"] For brevity, we’ll skip working with documents, as the next section covers a different and potentially easier way of working with CouchDB that should provide experience with this. As we work through the example, keep in mind that “under the hood” everything is being done by the application exactly as you have been doing here manually. Everything is done using GET, PUT, POST, and DELETE with a URI. Welcome to Futon After having seen CouchDB’s raw API, let’s get our feet wet by playing with Futon, the built-in administration interface. Futon provides full access to all of CouchDB’s features and makes it easy to work with some of the more complex ideas involved. With Futon we can create and destroy databases; view and edit documents; compose and run MapReduce views; and trigger replication between databases. To load Futon in your browser, visit: http://127.0.0.1:5984/_utils/ If you’re running version 0.9 or later, you should see something similar to Figure 3-1. In later chapters, we’ll focus on using CouchDB from server-side languages such as Ruby and Python. As such, this chapter is a great opportunity to showcase an example of natively serving up a dynamic web application using nothing more than CouchDB’s integrated web server, something you may wish to do with your own applications. The first thing we should do with a fresh installation of CouchDB is run the test suite to verify that everything is working properly. This assures us that any problems we may run into aren’t due to bothersome issues with our setup. By the same token, failures in the Futon test suite are a red flag, telling us to double-check our installation before attempting to use a potentially broken database server, saving us the confusion when nothing seems to be working quite like we expect! Welcome to Futon | 23 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 3-1. The Futon welcome screen Some common network configurations cause the replication test to fail when accessed via the localhost address. You can fix this by accessing CouchDB via http://127.0.0.1:5984/_utils/. Navigate to the test suite by clicking “Test Suite” on the Futon sidebar, then click “run all” at the top to kick things off. Figure 3-2 shows the Futon test suite running some tests. Because the test suite is run from the browser, not only does it test that CouchDB is functioning properly, it also verifies that your browser’s connection to the database is properly configured, which can be very handy for diagnosing misbehaving proxies or other HTTP middleware. If the test suite has an inordinate number of failures, you’ll need to see the troubleshooting section in Appendix D for the next steps to fix your installation. Now that the test suite is finished, you’ve verified that your CouchDB installation is successful and you’re ready to see what else Futon has to offer. Your First Database and Document Creating a database in Futon is simple. From the overview page, click “Create Database.” When asked for a name, enter hello-world and click the Create button. After your database has been created, Futon will display a list of all its documents. This list will start out empty (Figure 3-3), so let’s create our first document. Click the “Create 24 | Chapter 3: Getting Started WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 3-2. The Futon test suite running some tests Document” link and then the Create button in the pop up. Make sure to leave the document ID blank, and CouchDB will generate a UUID for you. For demoing purposes, having CouchDB assign a UUID is fine. When you write your first programs, we recommend assigning your own UUIDs. If your rely on the server to generate the UUID and you end up making two POST requests because the first POST request bombed out, you might generate two docs and never find out about the first one because only the second one will be reported back. Generating your own UUIDs makes sure that you’ll never end up with duplicate documents. Futon will display the newly created document, with its _id and _rev as the only fields. To create a new field, click the “Add Field” button. We’ll call the new field hello. Click the green check icon (or hit the Enter key) to finalize creating the hello field. Doubleclick the hello field’s value (default null) to edit it. If you try to enter world as the new value, you’ll get an error when you click the value’s green check icon. CouchDB values must be entered as valid JSON. Instead, enter "world" (with quotes) because this is a valid JSON string. You should have no problems saving it. You can experiment with other JSON values; e.g., [1, 2, "c"] or {"foo":"bar"}. Once you’ve entered your values into the document, make a note of its _rev attribute and click “Save Document.” The result should look like Figure 3-4. Your First Database and Document | 25 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 3-3. An empty database in Futon Figure 3-4. A “hello world” document in Futon You’ll notice that the document’s _rev has changed. We’ll go into more detail about this in later chapters, but for now, the important thing to note is that _rev acts like a safety feature when saving a document. As long as you and CouchDB agree on the most recent _rev of a document, you can successfully save your changes. Futon also provides a way to display the underlying JSON data, which can be more compact and easier to read, depending on what sort of data you are dealing with. To see the JSON version of our “hello world” document, click the Source tab. The result should look like Figure 3-5. 26 | Chapter 3: Getting Started WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 3-5. The JSON source of a “hello world” document in Futon Running a Query Using MapReduce Traditional relational databases allow you to run any queries you like as long as your data is structured correctly. In contrast, CouchDB uses predefined map and reduce functions in a style known as MapReduce. These functions provide great flexibility because they can adapt to variations in document structure, and indexes for each document can be computed independently and in parallel. The combination of a map and a reduce function is called a view in CouchDB terminology. For experienced relational database programmers, MapReduce can take some getting used to. Rather than declaring which rows from which tables to include in a result set and depending on the database to determine the most efficient way to run the query, reduce queries are based on simple range requests against the indexes generated by your map functions. Map functions are called once with each document as the argument. The function can choose to skip the document altogether or emit one or more view rows as key/value pairs. Map functions may not depend on any information outside of the document. This independence is what allows CouchDB views to be generated incrementally and in parallel. CouchDB views are stored as rows that are kept sorted by key. This makes retrieving data from a range of keys efficient even when there are thousands or millions of rows. When writing CouchDB map functions, your primary goal is to build an index that stores related data under nearby keys. Running a Query Using MapReduce | 27 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Before we can run an example MapReduce view, we’ll need some data to run it on. We’ll create documents carrying the price of various supermarket items as found at different stores. Let’s create documents for apples, oranges, and bananas. (Allow CouchDB to generate the _id and _rev fields.) Use Futon to create documents that have a final JSON structure that looks like this: { } "_id" : "bc2a41170621c326ec68382f846d5764", "_rev" : "2612672603", "item" : "apple", "prices" : { "Fresh Mart" : 1.59, "Price Max" : 5.99, "Apples Express" : 0.79 } This document should look like Figure 3-6 when entered into Futon. Download at WoweBook.com Figure 3-6. An example document with apple prices in Futon OK, now that that’s done, let’s create the document for oranges: { } "_id" : "bc2a41170621c326ec68382f846d5764", "_rev" : "2612672603", "item" : "orange", "prices" : { "Fresh Mart" : 1.99, "Price Max" : 3.19, "Citrus Circus" : 1.09 } And finally, the document for bananas: 28 | Chapter 3: Getting Started WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com { } "_id" : "bc2a41170621c326ec68382f846d5764", "_rev" : "2612672603", "item" : "banana", "prices" : { "Fresh Mart" : 1.99, "Price Max" : 0.79, "Banana Montana" : 4.22 } Imagine we’re catering a big luncheon, but the client is very price-sensitive. To find the lowest prices, we’re going to create our first view, which shows each fruit sorted by price. Click “hello-world” to return to the hello-world overview, and then from the “select view” menu choose “Temporary view…” to create a new view. The result should look something like Figure 3-7. Figure 3-7. A temporary view in Futon Edit the map function, on the left, so that it looks like the following: function(doc) { var store, price, value; if (doc.item && doc.prices) { for (store in doc.prices) { price = doc.prices[store]; value = [doc.item, store]; emit(price, value); } } } This is a JavaScript function that CouchDB runs for each of our documents as it computes the view. We’ll leave the reduce function blank for the time being. Running a Query Using MapReduce | 29 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Click “Run” and you should see result rows like in Figure 3-8, with the various items sorted by price. This map function could be even more useful if it grouped the items by type so that all the prices for bananas were next to each other in the result set. CouchDB’s key sorting system allows any valid JSON object as a key. In this case, we’ll emit an array of [item, price] so that CouchDB groups by item type and price. Figure 3-8. The results of running a view in Futon Let’s modify the view function so that it looks like this: function(doc) { var store, price, key; if (doc.item && doc.prices) { for (store in doc.prices) { price = doc.prices[store]; key = [doc.item, price]; emit(key, store); } } } Here, we first check that the document has the fields we want to use. CouchDB recovers gracefully from a few isolated map function failures, but when a map function fails regularly (due to a missing required field or other JavaScript exception), CouchDB shuts off its indexing to prevent any further resource usage. For this reason, it’s important to check for the existence of any fields before you use them. In this case, our map function 30 | Chapter 3: Getting Started WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com will skip the first “hello world” document we created without emitting any rows or encountering any errors. The result of this query should look like Figure 3-9. Figure 3-9. The results of running a view after grouping by item type and price Once we know we’ve got a document with an item type and some prices, we iterate over the item’s prices and emit key/values pairs. The key is an array of the item and the price, and forms the basis for CouchDB’s sorted index. In this case, the value is the name of the store where the item can be found for the listed price. View rows are sorted by their keys—in this example, first by item, then by price. This method of complex sorting is at the heart of creating useful indexes with CouchDB. MapReduce can be challenging, especially if you’ve spent years working with relational databases. The important things to keep in mind are that map functions give you an opportunity to sort your data using any key you choose, and that CouchDB’s design is focused on providing fast, efficient access to data within a range of keys. Triggering Replication Futon can trigger replication between two local databases, between a local and remote database, or even between two remote databases. We’ll show you how to replicate data Triggering Replication | 31 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com from one local database to another, which is a simple way of making backups of your databases as we’re working through the examples. First we’ll need to create an empty database to be the target of replication. Return to the overview and create a database called hello-replication. Now click “Replicator” in the sidebar and choose hello-world as the source and hello-replication as the target. Click “Replicate” to replicate your database. The result should look something like Figure 3-10. Figure 3-10. Running database replication in Futon For larger databases, replication can take much longer. It is important to leave the browser window open while replication is taking place. As an alternative, you can trigger replication via curl or some other HTTP client that can handle long-running connections. If your client closes the connection before replication finishes, you’ll have to retrigger it. Luckily, CouchDB’s replication can take over from where it left off instead of starting from scratch. Wrapping Up Now that you’ve seen most of Futon’s features, you’ll be prepared to dive in and inspect your data as we build our example application in the next few chapters. Futon’s pure JavaScript approach to managing CouchDB shows how it’s possible to build a fully featured web application using only CouchDB’s HTTP API and integrated web server. But before we get there, we’ll have another look at CouchDB’s HTTP API—now with a magnifying glass. Let’s curl on the couch and relax. 32 | Chapter 3: Getting Started WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 4 The Core API This chapter explores the CouchDB in minute detail. It shows all the nitty-gritty and clever bits. We show you best practices and guide you around common pitfalls. We start out by revisiting the basic operations we ran in the last chapter, looking behind the scenes. We also show what Futon needs to do behind its user interface to give us the nice features we saw earlier. This chapter is both an introduction to the core CouchDB API as well as a reference. If you can’t remember how to run a particular request or why some parameters are needed, you can always come back here and look things up (we are probably the heaviest users of this chapter). While explaining the API bits and pieces, we sometimes need to take a larger detour to explain the reasoning for a particular request. This is a good opportunity for us to tell you why CouchDB works the way it does. The API can be subdivided into the following sections. We’ll explore them individually: • • • • Server Databases Documents Replication Server This one is basic and simple. It can serve as a sanity check to see if CouchDB is running at all. It can also act as a safety guard for libraries that require a certain version of CouchDB. We’re using the curl utility again: curl http://127.0.0.1:5984/ CouchDB replies, all excited to get going: {"couchdb":"Welcome","version":"0.10.1"} 33 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com You get back a JSON string, that, if parsed into a native object or data structure of your programming language, gives you access to the welcome string and version information. This is not terribly useful, but it illustrates nicely the way CouchDB behaves. You send an HTTP request and you receive a JSON string in the HTTP response as a result. Databases Now let’s do something a little more useful: create databases. For the strict, CouchDB is a database management system (DMS). That means it can hold multiple databases. A database is a bucket that holds “related data.” We’ll explore later what that means exactly. In practice, the terminology is overlapping—often people refer to a DMS as “a database” and also a database within the DMS as “a database.” We might follow that slight oddity, so don’t get confused by it. In general, it should be clear from the context if we are talking about the whole of CouchDB or a single database within CouchDB. Now let’s make one! We want to store our favorite music albums, and we creatively give our database the name albums. Note that we’re now using the -X option again to tell curl to send a PUT request instead of the default GET request: curl -X PUT http://127.0.0.1:5984/albums CouchDB replies: {"ok":true} That’s it. You created a database and CouchDB told you that all went well. What happens if you try to create a database that already exists? Let’s try to create that database again: curl -X PUT http://127.0.0.1:5984/albums CouchDB replies: {"error":"file_exists","reason":"The database could not be created, the file already exists."} We get back an error. This is pretty convenient. We also learn a little bit about how CouchDB works. CouchDB stores each database in a single file. Very simple. This has some consequences down the road, but we’ll skip the details for now and explore the underlying storage system in Appendix F. Let’s create another database, this time with curl’s -v (for “verbose”) option. The verbose option tells curl to show us not only the essentials—the HTTP response body— but all the underlying request and response details: curl -vX PUT http://127.0.0.1:5984/albums-backup curl elaborates: * About to connect() to 127.0.0.1 port 5984 (#0) * Trying 127.0.0.1... connected 34 | Chapter 4: The Core API WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0) > PUT /albums-backup HTTP/1.1 > User-Agent: curl/7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3 > Host: 127.0.0.1:5984 > Accept: */* > < HTTP/1.1 201 Created < Server: CouchDB/0.9.0 (Erlang OTP/R12B) < Date: Sun, 05 Jul 2009 22:48:28 GMT < Content-Type: text/plain;charset=utf-8 < Content-Length: 12 < Cache-Control: must-revalidate < {"ok":true} * Connection #0 to host 127.0.0.1 left intact * Closing connection #0 What a mouthful. Let’s step through this line by line to understand what’s going on and find out what’s important. Once you’ve seen this output a few times, you’ll be able to spot the important bits more easily. * About to connect() to 127.0.0.1 port 5984 (#0) This is curl telling us that it is going to establish a TCP connection to the CouchDB server we specified in our request URI. Not at all important, except when debugging networking issues. * Trying 127.0.0.1... connected * Connected to 127.0.0.1 (127.0.0.1) port 5984 (#0) curl tells us it successfully connected to CouchDB. Again, not important if you aren’t trying to find problems with your network. The following lines are prefixed with > and < characters. > means the line was sent to CouchDB verbatim (without the actual >). < means the line was sent back to curl by CouchDB. > PUT /albums-backup HTTP/1.1 This initiates an HTTP request. Its method is PUT, the URI is /albums-backup, and the HTTP version is HTTP/1.1. There is also HTTP/1.0, which is simpler in some cases, but for all practical reasons you should be using HTTP/1.1. Next, we see a number of request headers. These are used to provide additional details about the request to CouchDB. > User-Agent: curl/7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3 The User-Agent header tell CouchDB which piece of client software is doing the HTTP request. We don’t learn anything new: it’s curl. This header is often useful in web development when there are known errors in client implementations that a server might want to prepare the response for. It also helps to determine which platform a user is Databases | 35 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com on. This information can be used for technical and statistical reasons. For CouchDB, the User-Agent header is irrelevant. > Host: 127.0.0.1:5984 The Host header is required by HTTP 1.1. It tells the server the hostname that came with the request. > Accept: */* The Accept header tells CouchDB that curl accepts any media type. We’ll look into why this is useful a little later. > An empty line denotes that the request headers are now finished and the rest of the request contains data we’re sending to the server. In this case, we’re not sending any data, so the rest of the curl output is dedicated to the HTTP response. < HTTP/1.1 201 Created The first line of CouchDB’s HTTP response includes the HTTP version information (again, to acknowledge that the requested version could be processed), an HTTP status code, and a status code message. Different requests trigger different response codes. There’s a whole range of them telling the client (curl in our case) what effect the request had on the server. Or, if an error occurred, what kind of error. RFC 2616 (the HTTP 1.1 specification) defines clear behavior for response codes. CouchDB fully follows the RFC. The 201 Created status code tells the client that the resource the request was made against was successfully created. No surprise here, but if you remember that we got an error message when we tried to create this database twice, you now know that this response could include a different response code. Acting upon responses based on response codes is a common practice. For example, all response codes of 400 or larger tell you that some error occurred. If you want to shortcut your logic and immediately deal with the error, you could just check a >= 400 response code. < Server: CouchDB/0.10.1 (Erlang OTP/R13B) The Server header is good for diagnostics. It tells us which CouchDB version and which underlying Erlang version we are talking to. In general, you can ignore this header, but it is good to know it’s there if you need it. < Date: Sun, 05 Jul 2009 22:48:28 GMT The Date header tells you the time of the server. Since client and server time are not necessary synchronized, this header is purely informational. You shouldn’t build any critical application logic on top of this! < Content-Type: text/plain;charset=utf-8 The Content-Type header tells you which MIME type the HTTP response body is and its encoding. We already know CouchDB returns JSON strings. The appropriate 36 | Chapter 4: The Core API WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Content-Type header is application/json. Why do we see text/plain? This is where pragmatism wins over purity. Sending an application/json Content-Type header will make a browser offer you the returned JSON for download instead of just displaying it. Since it is extremely useful to be able to test CouchDB from a browser, CouchDB sends a text/plain content type, so all browsers will display the JSON as text. There are some browser extensions that make your browser JSON-aware, but they are not installed by default. Do you remember the Accept request header and how it is set to \*/\* -> */* to express interest in any MIME type? If you send Accept: application/json in your request, CouchDB knows that you can deal with a pure JSON response with the proper ContentType header and will use it instead of text/plain. < Content-Length: 12 The Content-Length header simply tells us how many bytes the response body has. < Cache-Control: must-revalidate This Cache-Control header tells you, or any proxy server between CouchDB and you, not to cache this response. < This empty line tells us we’re done with the response headers and what follows now is the response body. {"ok":true} We’ve seen this before. * Connection #0 to host 127.0.0.1 left intact * Closing connection #0 The last two lines are curl telling us that it kept the TCP connection it opened in the beginning open for a moment, but then closed it after it received the entire response. Throughout the book, we’ll show more requests with the -v option, but we’ll omit some of the headers we’ve seen here and include only those that are important for the particular request. Creating databases is all fine, but how do we get rid of one? Easy—just change the HTTP method: > curl -vX DELETE http://127.0.0.1:5984/albums-backup This deletes a CouchDB database. The request will remove the file that the database contents are stored in. There is no “Are you sure?” safety net or any “Empty the trash” magic you’ve got to do to delete a database. Use this command with care. Your data will be deleted without a chance to bring it back easily if you don’t have a backup copy. This section went knee-deep into HTTP and set the stage for discussing the rest of the core CouchDB API. Next stop: documents. Databases | 37 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Documents Documents are CouchDB’s central data structure. The idea behind a document is, unsurprisingly, that of a real-world document—a sheet of paper such as an invoice, a recipe, or a business card. We already learned that CouchDB uses the JSON format to store documents. Let’s see how this storing works at the lowest level. Each document in CouchDB has an ID. This ID is unique per database. You are free to choose any string to be the ID, but for best results we recommend a UUID (or GUID), i.e., a Universally (or Globally) Unique IDentifier. UUIDs are random numbers that have such a low collision probability that everybody can make thousands of UUIDs a minute for millions of years without ever creating a duplicate. This is a great way to ensure two independent people cannot create two different documents with the same ID. Why should you care what somebody else is doing? For one, that somebody else could be you at a later time or on a different computer; secondly, CouchDB replication lets you share documents with others and using UUIDs ensures that it all works. But more on that later; let’s make some documents: curl -X PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af \ -d '{"title":"There is Nothing Left to Lose","artist":"Foo Fighters"}' CouchDB replies: {"ok":true,"id":"6e1295ed6c29495e54cc05947f18c8af","rev":"1-2902191555"} The curl command appears complex, but let’s break it down. First, -X PUT tells curl to make a PUT request. It is followed by the URL that specifies your CouchDB IP address and port. The resource part of the URL /albums/ 6e1295ed6c29495e54cc05947f18c8af specifies the location of a document inside our albums database. The wild collection of numbers and characters is a UUID. This UUID is your document’s ID. Finally, the -d flag tells curl to use the following string as the body for the PUT request. The string is a simple JSON structure including title and artist attributes with their respective values. If you don’t have a UUID handy, you can ask CouchDB to give you one (in fact, that is what we did just now without showing you). Simply send a GET request to /_uuids: curl -X GET http://127.0.0.1:5984/_uuids CouchDB replies: {"uuids":["6e1295ed6c29495e54cc05947f18c8af"]} Voilá, a UUID. If you need more than one, you can pass in the ?count=10 HTTP parameter to request 10 UUIDs, or really, any number you need. 38 | Chapter 4: The Core API WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com To double-check that CouchDB isn’t lying about having saved your document (it usually doesn’t), try to retrieve it by sending a GET request: curl -X GET http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af We hope you see a pattern here. Everything in CouchDB has an address, a URI, and you use the different HTTP methods to operate on these URIs. CouchDB replies: {"_id":"6e1295ed6c29495e54cc05947f18c8af", "_rev":"1-2902191555", "title":"There is Nothing Left to Lose", "artist":"Foo Fighters"} This looks a lot like the document you asked CouchDB to save, which is good. But you should notice that CouchDB added two fields to your JSON structure. The first is _id, which holds the UUID we asked CouchDB to save our document under. We always know the ID of a document if it is included, which is very convenient. The second field is _rev. It stands for revision. Revisions If you want to change a document in CouchDB, you don’t tell it to go and find a field in a specific document and insert a new value. Instead, you load the full document out of CouchDB, make your changes in the JSON structure (or object, when you are doing actual programming), and save the entire new revision (or version) of that document back into CouchDB. Each revision is identified by a new _rev value. If you want to update or delete a document, CouchDB expects you to include the _rev field of the revision you wish to change. When CouchDB accepts the change, it will generate a new revision number. This mechanism ensures that, in case somebody else made a change unbeknownst to you before you got to request the document update, CouchDB will not accept your update because you are likely to overwrite data you didn’t know existed. Or simplified: whoever saves a change to a document first, wins. Let’s see what happens if we don’t provide a _rev field (which is equivalent to providing a outdated value): curl -X PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af \ -d '{"title":"There is Nothing Left to Lose","artist":"Foo Fighters","year":"1997"}' CouchDB replies: {"error":"conflict","reason":"Document update conflict."} If you see this, add the latest revision number of your document to the JSON structure: curl -X PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af \ -d '{"_rev":"1-2902191555","title":"There is Nothing Left to Lose", "artist":"Foo Fighters","year":"1997"}' Documents | 39 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Now you see why it was handy that CouchDB returned that _rev when we made the initial request. CouchDB replies: {"ok":true,"id":"6e1295ed6c29495e54cc05947f18c8af","rev":"2-2739352689"} CouchDB accepted your write and also generated a new revision number. The revision number is the md5 hash of the transport representation of a document with an N- prefix denoting the number of times a document got updated. This is useful for replication. See Chapter 17 for more information. There are multiple reasons why CouchDB uses this revision system, which is also called Multi-Version Concurrency Control (MVCC). They all work hand-in-hand, and this is a good opportunity to explain some of them. One of the aspects of the HTTP protocol that CouchDB uses is that it is stateless. What does that mean? When talking to CouchDB you need to make requests. Making a request includes opening a network connection to CouchDB, exchanging bytes, and closing the connection. This is done every time you make a request. Other protocols allow you to open a connection, exchange bytes, keep the connection open, exchange more bytes later—maybe depending on the bytes you exchanged at the beginning— and eventually close the connection. Holding a connection open for later use requires the server to do extra work. One common pattern is that for the lifetime of a connection, the client has a consistent and static view of the data on the server. Managing huge amounts of parallel connections is a significant amount of work. HTTP connections are usually short-lived, and making the same guarantees is a lot easier. As a result, CouchDB can handle many more concurrent connections. Another reason CouchDB uses MVCC is that this model is simpler conceptually and, as a consequence, easier to program. CouchDB uses less code to make this work, and less code is always good because the ratio of defects per lines of code is static. The revision system also has positive effects on replication and storage mechanisms, but we’ll explore these later in the book. The terms version and revision might sound familiar (if you are programming without version control, drop this book right now and start learning one of the popular systems). Using new versions for document changes works a lot like version control, but there’s an important difference: CouchDB does not guarantee that older versions are kept around. Documents in Detail Now let’s have a closer look at our document creation requests with the curl -v flag that was helpful when we explored the database API earlier. This is also a good opportunity to create more documents that we can use in later examples. 40 | Chapter 4: The Core API WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com We’ll add some more of our favorite music albums. Get a fresh UUID from the /_uuids resource. If you don’t remember how that works, you can look it up a few pages back. curl -vX PUT http://127.0.0.1:5984/albums/70b50bfa0a4b3aed1f8aff9e92dc16a0 \ -d '{"title":"Blackened Sky","artist":"Biffy Clyro","year":2002}' By the way, if you happen to know more information about your favorite albums, don’t hesitate to add more properties. And don’t worry about not knowing all the information for all the albums. CouchDB’s schemaless documents can contain whatever you know. After all, you should relax and not worry about data. Now with the -v option, CouchDB’s reply (with only the important bits shown) looks like this: > PUT /albums/70b50bfa0a4b3aed1f8aff9e92dc16a0 HTTP/1.1 > < HTTP/1.1 201 Created < Location: http://127.0.0.1:5984/albums/70b50bfa0a4b3aed1f8aff9e92dc16a0 < Etag: "1-2248288203" < {"ok":true,"id":"70b50bfa0a4b3aed1f8aff9e92dc16a0","rev":"1-2248288203"} We’re getting back the 201 Created HTTP status code in the response headers, as we saw earlier when we created a database. The Location header gives us a full URL to our newly created document. And there’s a new header. An Etag in HTTP-speak identifies a specific version of a resource. In this case, it identifies a specific version (the first one) of our new document. Sound familiar? Yes, conceptually, an Etag is the same as a CouchDB document revision number, and it shouldn’t come as a surprise that CouchDB uses revision numbers for Etags. Etags are useful for caching infrastructures. We’ll learn how to use them in Chapter 8. Attachments CouchDB documents can have attachments just like an email message can have attachments. An attachment is identified by a name and includes its MIME type (or Content-Type) and the number of bytes the attachment contains. Attachments can be any data. It is easiest to think about attachments as files attached to a document. These files can be text, images, Word documents, music, or movie files. Let’s make one. Attachments get their own URL where you can upload data. Say we want to add the album artwork to the 6e1295ed6c29495e54cc05947f18c8af document (“There is Nothing Left to Lose”), and let’s also say the artwork is in a file artwork.jpg in the current directory: > curl -vX PUT http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af/ \ artwork.jpg?rev=2-2739352689 --data-binary @artwork.jpg -H "Content-Type: image/jpg" Documents | 41 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com The -d@ option tells curl to read a file’s contents into the HTTP request body. We’re using the -H option to tell CouchDB that we’re uploading a JPEG file. CouchDB will keep this information around and will send the appropriate header when requesting this attachment; in case of an image like this, a browser will render the image instead of offering you the data for download. This will come in handy later. Note that you need to provide the current revision number of the document you’re attaching the artwork to, just as if you would update the document. Because, after all, attaching some data is changing the document. You should now see your artwork image if you point your browser to http:// 127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af/artwork.jpg. If you request the document again, you’ll see a new member: curl http://127.0.0.1:5984/albums/6e1295ed6c29495e54cc05947f18c8af CouchDB replies: {"_id":"6e1295ed6c29495e54cc05947f18c8af","_rev":"3-131533518","title": "There is Nothing Left to Lose","artist":"Foo Fighters","year":"1997","_attachments": {"artwork.jpg":{"stub":true,"content_type":"image/jpg","length":52450}}} _attachments is a list of keys and values where the values are JSON objects containing the attachment metadata. stub=true tells us that this entry is just the metadata. If we use the ?attachments=true HTTP option when requesting this document, we’d get a Base64-encoded string containing the attachment data. We’ll have a look at more document request options later as we explore more features of CouchDB, such as replication, which is the next topic. Replication CouchDB replication is a mechanism to synchronize databases. Much like rsync synchronizes two directories locally or over a network, replication synchronizes two databases locally or remotely. In a simple POST request, you tell CouchDB the source and the target of a replication and CouchDB will figure out which documents and new document revisions are on source that are not yet on target, and will proceed to move the missing documents and revisions over. We’ll take an in-depth look at replication later in the book; in this chapter, we’ll just show you how to use it. First, we’ll create a target database. Note that CouchDB won’t automatically create a target database for you, and will return a replication failure if the target doesn’t exist (likewise for the source, but that mistake isn’t as easy to make): curl -X PUT http://127.0.0.1:5984/albums-replica Now we can use the database albums-replica as a replication target: 42 | Chapter 4: The Core API WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com curl -vX POST http://127.0.0.1:5984/_replicate \ -d '{"source":"albums","target":"albums-replica"}' As of version 0.11, CouchDB supports the option "create_tar get":true placed in the JSON POSTed to the _replicate URL. It implicitly creates the target database if it doesn’t exist. CouchDB replies (this time we formatted the output so you can read it more easily): { } "history": [ { "start_last_seq": 0, "missing_found": 2, "docs_read": 2, "end_last_seq": 5, "missing_checked": 2, "docs_written": 2, "doc_write_failures": 0, "end_time": "Sat, 11 Jul 2009 17:36:21 GMT", "start_time": "Sat, 11 Jul 2009 17:36:20 GMT" } ], "source_last_seq": 5, "session_id": "924e75e914392343de89c99d29d06671", "ok": true CouchDB maintains a session history of replications. The response for a replication request contains the history entry for this replication session. It is also worth noting that the request for replication will stay open until replication closes. If you have a lot of documents, it’ll take a while until they are all replicated and you won’t get back the replication response until all documents are replicated. It is important to note that replication replicates the database only as it was at the point in time when replication was started. So, any additions, modifications, or deletions subsequent to the start of replication will not be replicated. We’ll punt on the details again—the "ok": true at the end tells us all went well. If you now have a look at the albums-replica database, you should see all the documents that you created in the albums database. Neat, eh? What you just did is called local replication in CouchDB terms. You created a local copy of a database. This is useful for backups or to keep snapshots of a specific state of your data around for later. You might want to do this if you are developing your applications but want to be able to roll back to a stable version of your code and data. There are more types of replication useful in other situations. The source and target members of our replication request are actually links (like in HTML) and so far we’ve seen links relative to the server we’re working on (hence local). You can also specify a remote database as the target: Replication | 43 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com curl -vX POST http://127.0.0.1:5984/_replicate \ -d '{"source":"albums","target":"http://127.0.0.1:5984/albums-replica"}' Using a local source and a remote target database is called push replication. We’re pushing changes to a remote server. Since we don’t have a second CouchDB server around just yet, we’ll just use the absolute address of our single server, but you should be able to infer from this that you can put any remote server in there. This is great for sharing local changes with remote servers or buddies next door. You can also use a remote source and a local target to do a pull replication. This is great for getting the latest changes from a server that is used by others: curl -vX POST http://127.0.0.1:5984/_replicate \ -d '{"source":"http://127.0.0.1:5984/albums-replica","target":"albums"}' Finally, you can run remote replication, which is mostly useful for management operations: curl -vX POST http://127.0.0.1:5984/_replicate \ -d '{"source":"http://127.0.0.1:5984/albums", "target":"http://127.0.0.1:5984/albums-replica"}' CouchDB and REST CouchDB prides itself on having a RESTful API, but these replication requests don’t look very RESTy to the trained eye. What’s up with that? While CouchDB’s core database, document, and attachment API are RESTful, not all of CouchDB’s API is. The replication API is one example. There are more, as we’ll see later in the book. Why are there RESTful and non-RESTful APIs mixed up here? Have the developers been too lazy to go REST all the way? Remember, REST is an architectural style that lends itself to certain architectures (such as the CouchDB document API). But it is not a one-size-fits-all. Triggering an event like replication does not make a whole lot of sense in the REST world. It is more like a traditional remote procedure call. And there is nothing wrong with this. We very much believe in the “use the right tool for the job” philosophy, and REST does not fit every job. For support, we refer to Leonard Richardson and Sam Ruby who wrote RESTful Web Services (O’Reilly), as they share our view. Wrapping Up This is still not the full CouchDB API, but we discussed the essentials in great detail. We’re going to fill in the blanks as we go. For now, we believe you’re ready to start building CouchDB applications. 44 | Chapter 4: The Core API WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com PART II Developing with CouchDB WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 5 Design Documents Design documents are a special type of CouchDB document that contains application code. Because it runs inside a database, the application API is highly structured. We’ve seen JavaScript views and other functions in the previous chapters. In this section, we’ll take a look at the function APIs, and talk about how functions in a design document are related within applications. This part (Part II, Chapters Chapter 5 through Chapter 9) lays the foundation for Part III, where we take what we’ve learned and build a small blog application to further develop an understanding of how CouchDB applications are built. The application is called Sofa, and on a few occasions we discuss it this part. If you are unclear on what we are referring to, do not worry, we’ll get to it in Part III. Document Modeling In our experience, there are two main kinds of documents. The first kind is like something a word processor would save, or a user profile. With that sort of data, you want to denormalize as much as you possibly can. Basically, you want to be able to load the document in one request and get something that makes sense enough to display. A technique exists for creating “virtual” documents by using views to collate data together. You could use this to store each attribute of your user profiles in a different document, but I wouldn’t recommend it. Virtual documents are useful in cases where the presented view will be created by merging the work of different authors; for instance, the reference example, a blog post, and its comments in one query. A blog post titled “CouchDB Joins,” by Christopher Lenz, covers this in more detail.* This virtual document idea takes us to the other kind of document—the event log. Use this in cases where you don’t trust user input or where you need to trigger an asynchronous job. This records the user action as an event, so only minimal validation needs * http://www.cmlenz.net/archives/2007/10/couchdb-joins 47 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com to occur at save time. It’s when you load the document for further work that you’d check for complex relational-style constraints. You can treat documents as state machines, with a combination of user input and background processing managing document state. You’d use a view by state to pull out the relevant document—changing its state would move it in the view. This approach is also useful for logging—combined with the batch=ok performance hint, CouchDB should make a fine log store, and reduce views are ideal for finding things like average response time or highly active users. The Query Server CouchDB’s default query server (the software package that executes design document functions) is written in JavaScript, but there are views servers available for nearly any language you can imagine. Implementing a new language is a matter of handling a few JSON commands from a simple line-based program. In this section, we’ll review existing functionality like MapReduce views, update validation functions, and show and list transforms. We’ll also briefly describe capabilities available on CouchDB’s roadmap, like replication filters, update handlers for parsing non-JSON input, and a rewrite handler for making application URLs more palatable. Since CouchDB is an open source project, we can’t really say when each planned feature will become available, but it’s our hope that everything described here is available by the time you read this. We’ll make it clear in the text when we’re talking about things that aren’t yet in the CouchDB trunk. Applications Are Documents CouchDB is designed to work best when there is a one-to-one correspondence between applications and design documents. A design document is a CouchDB document with an id that begins with _design/. For instance, the example blog application, Sofa, is stored in a design document with the ID _design/sofa (see Figure 5-1). Design documents are just like any other CouchDB document—they replicate along with the other documents in their database and track edit conflicts with the rev parameter. As we’ve seen, design documents are normal JSON documents, denoted by the fact that their DocID is prefixed with _design/. CouchDB looks for views and other application functions here. The static HTML pages of our application are served as attachments to the design document. Views and validations, however, aren’t stored as attachments; rather, they are directly included in the design document’s JSON body. 48 | Chapter 5: Design Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 5-1. Anatomy of our design document CouchDB’s MapReduce queries are stored in the views field. This is how Futon displays and allows you to edit MapReduce queries. View indexes are stored on a per–design document basis, according to a fingerprint of the function’s text contents. This means that if you edit attachments, validations, or any other non-view (or language) fields on the design document, the views will not be regenerated. However, if you change a map Applications Are Documents | 49 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com or a reduce function, the view index will be deleted and a new index built for the new view functions. CouchDB has the capability to render responses in formats other than raw JSON. The design doc fields show and list contain functions used to transform raw JSON into HTML, XML, or other Content-Types. This allows CouchDB to serve Atom feeds without any additional middleware. The show and list functions are a little like “actions” in traditional web frameworks—they run some code based on a request and render a response. However, they differ from actions in that they may not have side effects. This means that they are largely restricted to handling GET requests, but it also means they can be cached by HTTP proxies like Varnish. Because application logic is contained in a single document, code upgrades can be accomplished with CouchDB replication. This also opens the possibility for a single database to host multiple applications. The interface a newspaper editor needs is vastly different from what a reader desires, although the data is largely the same. They can both be hosted by the same database, in different design documents. A CouchDB database can contain many design documents. Example design DocIDs are: _design/calendar _design/contacts _design/blog _design/admin In the full CouchDB URL structure, you’d be able to GET the design document JSON at URLs like: http://localhost:5984/mydb/_design/calendar http://127.0.0.1:5984/mydb/_design/contacts http://127.0.0.1:5984/mydb/_design/blog http://127.0.0.1:5984/mydb/_design/admin We show this to note that design documents have a special case, as they are the only documents whose URLs can be used with a literal slash. We’ve done this because nobody likes to see %2F in their browser’s location bar. In all other cases, a slash in a DocID must be escaped when used in a URL. For instance, the DocID movies/jaws would appear in the URL like this: http://127.0.0.1:5984/mydb/movies%2Fjaws. We’ll build the first iteration of the example application without using show or list, because writing Ajax queries against the JSON API is a better way to teach CouchDB as a database. The APIs we explore in the first iteration are the same APIs you’d use to analyze log data, archive assets, or manage persistent queues. In the second iteration, we’ll upgrade our example blog so that it can function with client-side JavaScript turned off. For now, sticking to Ajax queries gives more transparency into how CouchDB’s JSON/HTTP API works. JSON is a subset of JavaScript, so working with it in JavaScript keeps the impedance mismatch low, while the browser’s XMLHttpRequest (XHR) object handles the HTTP details for us. 50 | Chapter 5: Design Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CouchDB uses the validate_doc_update function to prevent invalid or unauthorized document updates from proceeding. We use it in the example application to ensure that blog posts can be authored only by logged-in users. CouchDB’s validation functions also can’t have any side effects, and they have the opportunity to block not only end user document saves, but also replicated documents from other nodes. We’ll talk about validation in depth in Part III. The raw images, JavaScript, CSS, and HTML assets needed by Sofa are stored in the _attachments field, which is interesting in that by default it shows only the stubs, rather than the full content of the files. Attachments are available on all CouchDB documents, not just design documents, so asset management applications have as much flexibility as they could need. If a set of resources is required for your application to run, they should be attached to the design document. This means that a new user can easily bootstrap your application on an empty database. The other fields in the design document shown in Figure 5-1 (and in the design documents we’ll be using) are used by CouchApp’s upload process (see Chapter 10 for more information on CouchApp). The signatures field allows us to avoid updating attachments that have not changed between the disk and the database. It does this by comparing file content hashes. The lib field is used to hold additional JavaScript code and JSON data to be inserted at deploy time into view, show, and validation functions. We’ll explain CouchApp in the next chapter. A Basic Design Document In the next section we’ll get into advanced techniques for working with design documents, but before we finish here, let’s look at a very basic design document. All we’ll do is define a single view, but it should be enough to show you how design documents fit into the larger system. First, add the following text (or something like it) to a text file called mydesign.json using your editor: { } "_id" : "_design/example", "views" : { "foo" : { "map" : "function(doc){ emit(doc._id, doc._rev)}" } } Now use curl to PUT the file to CouchDB (we’ll create a database first for good measure): curl -X PUT http://127.0.0.1:5984/basic curl -X PUT http://127.0.0.1:5984/basic/_design/example -d @mydesign.json From the second request, you should see a response like: {"ok":true,"id":"_design/example","rev":"1-230141dfa7e07c3dbfef0789bf11773a"} A Basic Design Document | 51 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Now we can query the view we’ve defined, but before we do that, we should add a few documents to the database so we have something to view. Running the following command a few times will add empty documents: curl -X POST http://127.0.0.1:5984/basic -d '{}' Now to query the view: curl http://127.0.0.1:5984/basic/_design/example/_view/foo This should give you a list of all the documents in the database (except the design document). You’ve created and used your first design document! Looking to the Future There are other design document functions that are being introduced at the time of this writing, including _update and _filter that we aren’t covering in depth here. Filter functions are covered in Chapter 20. Imagine a web service that POSTs an XML blob at a URL of your choosing when particular events occur. PayPal’s instant payment notification is one of these. With an _update handler, you can POST these directly in CouchDB and it can parse the XML into a JSON document and save it. The same goes for CSV, multi-part form, or any other format. The bigger picture we’re working on is like an app server, but different in one crucial regard: rather than let the developer do whatever he wants (loop a list of DocIDs and make queries, make queries based on the results of other queries, etc.), we’re defining “safe” transformations, such as view, show, list, and update. By safe, we mean that they have well-known performance characteristics and otherwise fit into CouchDB’s architecture in a streamlined way. The goal here is to provide a way to build standalone apps that can also be easily indexed by search engines and used via screen readers. Hence, the push for plain old HTML. You can pretty much rely on JavaScript getting executed (except when you can’t). Having HTML resources means CouchDB is suitable for public-facing web apps. On the horizon are a rewrite handler and a database event handler, as they seem to flesh out the application capabilities nicely. A rewrite handler would allow your application to present its own URL space, which would make integration into existing systems a bit easier. An event handler would allow you to run asynchronous processes when the database changes, so that, for instance, a document update can trigger a workflow, multi-document validation, or message queue. 52 | Chapter 5: Design Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 6 Finding Your Data with Views Views are useful for many purposes: • Filtering the documents in your database to find those relevant to a particular process. • Extracting data from your documents and presenting it in a specific order. • Building efficient indexes to find documents by any value or structure that resides in them. • Use these indexes to represent relationships among documents. • Finally, with views you can make all sorts of calculations on the data in your documents. For example, a view can answer the question of what your company’s spending was in the last week, month, or year. What Is a View? Let’s go through the different use cases. First is extracting data that you might need for a special purpose in a specific order. For a front page, we want a list of blog post titles sorted by date. We’ll work with a set of example documents as we walk through how views work: { } { "_id":"biking", "_rev":"AE19EBC7654", "title":"Biking", "body":"My biggest hobby is mountainbiking. The other day...", "date":"2009/01/30 18:04:11" "_id":"bought-a-cat", "_rev":"4A3BBEE711", 53 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com } { } "title":"Bought a Cat", "body":"I went to the the pet store earlier and brought home a little kitty...", "date":"2009/02/17 21:13:39" "_id":"hello-world", "_rev":"43FBA4E7AB", "title":"Hello World", "body":"Well hello and welcome to my new blog...", "date":"2009/01/15 15:52:20" Three will do for the example. Note that the documents are sorted by "_id", which is how they are stored in the database. Now we define a view. Chapter 3 showed you how to create a view in Futon, the CouchDB administration client. Bear with us without an explanation while we show you some code: function(doc) { if(doc.date && doc.title) { emit(doc.date, doc.title); } } This is a map function, and it is written in JavaScript. If you are not familiar with JavaScript but have used C or any other C-like language such as Java, PHP, or C#, this should look familiar. It is a simple function definition. You provide CouchDB with view functions as strings stored inside the views field of a design document. You don’t run it yourself. Instead, when you query your view, CouchDB takes the source code and runs it for you on every document in the database your view was defined in. You query your view to retrieve the view result. All map functions have a single parameter doc. This is a single document in the database. Our map function checks whether our document has a date and a title attribute— luckily, all of our documents have them—and then calls the built-in emit() function with these two attributes as arguments. The emit() function always takes two arguments: the first is key, and the second is value. The emit(key, value) function creates an entry in our view result. One more thing: the emit() function can be called multiple times in the map function to create multiple entries in the view results from a single document, but we are not doing that yet. CouchDB takes whatever you pass into the emit() function and puts it into a list (see Table 6-1). Each row in that list includes the key and value. More importantly, the list is sorted by key (by doc.date in our case). The most important feature of a view result is that it is sorted by key. We will come back to that over and over again to do neat things. Stay tuned. 54 | Chapter 6: Finding Your Data with Views WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Table 6-1. View results Key Value "2009/01/15 15:52:20" "Hello World" "2009/01/30 18:04:11" "Biking" "2009/02/17 21:13:39" "Bought a Cat" If you read carefully over the last few paragraphs, one part stands out: “When you query your view, CouchDB takes the source code and runs it for you on every document in the database.” If you have a lot of documents, that takes quite a bit of time and you might wonder if it is not horribly inefficient to do this. Yes, it would be, but CouchDB is designed to avoid any extra costs: it only runs through all documents once, when you first query your view. If a document is changed, the map function is only run once, to recompute the keys and values for that single document. The view result is stored in a B-tree, just like the structure that is responsible for holding your documents. View B-trees are stored in their own file, so that for high-performance CouchDB usage, you can keep views on their own disk. The B-tree provides very fast lookups of rows by key, as well as efficient streaming of rows in a key range. In our example, a single view can answer all questions that involve time: “Give me all the blog posts from last week” or “last month” or “this year.” Pretty neat. Read more about how CouchDB’s B-trees work in Appendix F. When we query our view, we get back a list of all documents sorted by date. Each row also includes the post title so we can construct links to posts. Figure 6-1 is just a graphical representation of the view result. The actual result is JSON-encoded and contains a little more metadata: { "total_rows": 3, "offset": 0, "rows": [ { "key": "2009/01/15 15:52:20", "id": "hello-world", "value": "Hello World" }, { "key": "2009/02/17 21:13:39", "id": "bought-a-cat", "value": "Bought a Cat" }, { } "key": "2009/01/30 18:04:11", "id": "biking", "value": "Biking" What Is a View? | 55 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com } ] Now, the actual result is not as nicely formatted and doesn’t include any superfluous whitespace or newlines, but this is better for you (and us!) to read and understand. Where does that "id" member in the result rows come from? That wasn’t there before. That’s because we omitted it earlier to avoid confusion. CouchDB automatically includes the document ID of the document that created the entry in the view result. We’ll use this as well when constructing links to the blog post pages. Efficient Lookups Let’s move on to the second use case for views: “building efficient indexes to find documents by any value or structure that resides in them.” We already explained the efficient indexing, but we skipped a few details. This is a good time to finish this discussion as we are looking at map functions that are a little more complex. First, back to the B-trees! We explained that the B-tree that backs the key-sorted view result is built only once, when you first query a view, and all subsequent queries will just read the B-tree instead of executing the map function for all documents again. What happens, though, when you change a document, add a new one, or delete one? Easy: CouchDB is smart enough to find the rows in the view result that were created by a specific document. It marks them invalid so that they no longer show up in view results. If the document was deleted, we’re good—the resulting B-tree reflects the state of the database. If a document got updated, the new document is run through the map function and the resulting new lines are inserted into the B-tree at the correct spots. New documents are handled in the same way. Appendix F demonstrates that a B-tree is a very efficient data structure for our needs, and the crash-only design of CouchDB databases is carried over to the view indexes as well. To add one more point to the efficiency discussion: usually multiple documents are updated between view queries. The mechanism explained in the previous paragraph gets applied to all changes in the database since the last time the view was queried in a batch operation, which makes things even faster and is generally a better use of your resources. Find One On to more complex map functions. We said “find documents by any value or structure that resides in them.” We already explained how to extract a value by which to sort a list of views (our date field). The same mechanism is used for fast lookups. The URI to query to get a view’s result is /database/_design/designdocname/_view/viewname. This gives you a list of all rows in the view. We have only three documents, so things are small, but with thousands of documents, this can get long. You can add view parameters to the URI to constrain the result set. Say we know the date of a blog post. To find 56 | Chapter 6: Finding Your Data with Views WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com a single document, we would use /blog/_design/docs/_view/by_date?key="2009/01/30 18:04:11" to get the “Biking” blog post. Remember that you can place whatever you like in the key parameter to the emit() function. Whatever you put in there, we can now use to look up exactly—and fast. Note that in the case where multiple rows have the same key (perhaps we design a view where the key is the name of the post’s author), key queries can return more than one row. Find Many We talked about “getting all posts for last month.” If it’s February now, this is as easy as /blog/_design/docs/_view/by_date?startkey="2010/01/01 00:00:00"&end key="2010/02/00 00:00:00". The startkey and endkey parameters specify an inclusive range on which we can search. To make things a little nicer and to prepare for a future example, we are going to change the format of our date field. Instead of a string, we are going to use an array, where individual members are part of a timestamp in decreasing significance. This sounds fancy, but it is rather easy. Instead of: { } "date": "2009/01/31 00:00:00" we use: "date": [2009, 1, 31, 0, 0, 0] Our map function does not have to change for this, but our view result looks a little different. See Table 6-2. Table 6-2. New view results Key Value [2009, 1, 15, 15, 52, 20] "Hello World" [2009, 2, 17, 21, 13, 39] "Biking" [2009, 1, 30, 18, 4, 11] "Bought a Cat" And our queries change to /blog/_design/docs/_view/by_date?key=[2009, 1, 1, 0, 0, 0] and /blog/_design/docs/_view/by_date?key=[2009, 01, 31, 0, 0, 0]. For all you care, this is just a change in syntax, not meaning. But it shows you the power of views. Not only can you construct an index with scalar values like strings and integers, you can also use JSON structures as keys for your views. Say we tag our documents with a list of tags and want to see all tags, but we don’t care for documents that have not been tagged. Efficient Lookups | 57 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com { } { } ... tags: ["cool", "freak", "plankton"], ... ... tags: [], ... function(doc) { if(doc.tags.length > 0) { for(var idx in doc.tags) { emit(doc.tags[idx], null); } } } This shows a few new things. You can have conditions on structure (if(doc.tags.length > 0)) instead of just values. This is also an example of how a map function calls emit() multiple times per document. And finally, you can pass null instead of a value to the value parameter. The same is true for the key parameter. We’ll see in a bit how that is useful. Reversed Results To retrieve view results in reverse order, use the descending=true query parameter. If you are using a startkey parameter, you will find that CouchDB returns different rows or no rows at all. What’s up with that? It’s pretty easy to understand when you see how view query options work under the hood. A view is stored in a tree structure for fast lookups. Whenever you query a view, this is how CouchDB operates: 1. Starts reading at the top, or at the position that startkey specifies, if present. 2. Returns one row at a time until the end or until it hits endkey, if present. If you specify descending=true, the reading direction is reversed, not the sort order of the rows in the view. In addition, the same two-step procedure is followed. Say you have a view result that looks like this: Key Value 0 "foo" 1 "bar" 2 "baz" 58 | Chapter 6: Finding Your Data with Views WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Here are potential query options: ?startkey=1&descending=true. What will CouchDB do? See #1 above: it jumps to startkey, which is the row with the key 1, and starts reading backward until it hits the end of the view. So the particular result would be: Key Value 1 "bar" 0 "foo" This is very likely not what you want. To get the rows with the indexes 1 and 2 in reverse order, you need to switch the startkey to endkey: endkey=1&descending=true: Key Value 2 "baz" 1 "bar" Now that looks a lot better. CouchDB started reading at the bottom of the view and went backward until it hit endkey. The View to Get Comments for Posts We use an array key here to support the group_level reduce query parameter. CouchDB’s views are stored in the B-tree file structure (which will be described in more detail later on). Because of the way B-trees are structured, we can cache the intermediate reduce results in the non-leaf nodes of the tree, so reduce queries can be computed along arbitrary key ranges in logarithmic time. See Figure 6-1. In the blog app, we use group_level reduce queries to compute the count of comments both on a per-post and total basis, achieved by querying the same view index with different methods. With some array keys, and assuming each key has the value 1: ["a","b","c"] ["a","b","e"] ["a","c","m"] ["b","a","c"] ["b","a","g"] Download at WoweBook.com the reduce view: function(keys, values, rereduce) { return sum(values) } returns the total number of rows between the start and end key. So with start key=["a","b"]&endkey=["b"] (which includes the first three of the above keys) the result would equal 3. The effect is to count rows. If you’d like to count rows without depending on the row value, you can switch on the rereduce parameter: The View to Get Comments for Posts | 59 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com function(keys, values, rereduce) { if (rereduce) { return sum(values); } else { return values.length; } } Figure 6-1. Comments map function This is the reduce view used by the example app to count comments, while utilizing the map to output the comments, which are more useful than just 1 over and over. It pays to spend some time playing around with map and reduce functions. Futon is OK for this, but it doesn’t give full access to all the query parameters. Writing your own test code for views in your language of choice is a great way to explore the nuances and capabilities of CouchDB’s incremental MapReduce system. Anyway, with a group_level query, you’re basically running a series of reduce range queries: one for each group that shows up at the level you query. Let’s reprint the key list from earlier, grouped at level 1: ["a"] ["b"] 3 2 And at group_level=2: ["a","b"] ["a","c"] ["b","a"] 2 1 2 60 | Chapter 6: Finding Your Data with Views WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Using the parameter group=true makes it behave as though it were group_level=Exact, so in the case of our current example, it would give the number 1 for each key, as there are no exactly duplicated keys. Reduce/Rereduce We briefly talked about the rereduce parameter to your reduce function. We’ll explain what’s up with it in this section. By now, you should have learned that your view result is stored in B-tree index structure for efficiency. The existence and use of the rere duce parameter is tightly coupled to how the B-tree index works. Consider the map result shown in Example 6-1. Example 6-1. Example view result (mmm, food) "afrikan", 1 "afrikan", 1 "chinese", 1 "chinese", 1 "chinese", 1 "chinese", 1 "french", 1 "italian", 1 "italian", 1 "spanish", 1 "vietnamese", 1 "vietnamese", 1 When we want to find out how many dishes there are per origin, we can reuse the simple reduce function shown earlier: function(keys, values, rereduce) { return sum(values); } Figure 6-2 shows a simplified version of what the B-tree index looks like. We abbreviated the key strings. Reduce/Rereduce | 61 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 6-2. The B-tree index The view result is what computer science grads call a “pre-order” walk through the tree. We look at each element in each node starting from the left. Whenever we see that there is a subnode to descend into, we descend and start reading the elements in that subnode. When we have walked through the entire tree, we’re done. You can see that CouchDB stores both keys and values inside each leaf node. In our case, it is simply always 1, but you might have a value where you count other results and then all rows have a different value. What’s important is that CouchDB runs all elements that are within a node into the reduce function (setting the rereduce parameter to false) and stores the result inside the parent node along with the edge to the subnode. In our case, each edge has a 3 representing the reduce value for the node it points to. In reality, nodes have more than 1,600 elements in them. CouchDB computes the result for all the elements in multiple iterations over the elements in a single node, not all at once (which would be disastrous for memory consumption). Now let’s see what happens when we run a query. We want to know how many "chinese" entries we have. The query option is simple: ?key="chinese". See Figure 6-3. Figure 6-3. The B-tree index reduce result 62 | Chapter 6: Finding Your Data with Views WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CouchDB detects that all values in the subnode include the "chinese" key. It concludes that it can take just the 3 value associated with that node to compute the final result. It then finds the node left to it and sees that it’s a node with keys outside the requested range (key= requests a range where the beginning and the end are the same value). It concludes that it has to use the "chinese" element’s value and the other node’s value and run them through the reduce function with the rereduce parameter set to true. The reduce function effectively calculates 3 + 1 on query time and returns the desired result. Example 6-2 shows some pseudocode that shows the last invocation of the reduce function with actual values. Example 6-2. The result is 4 function(null, [3, 1], true) { return sum([3, 1]); } Now, we said your reduce function must actually reduce your values. If you see the B-tree, it should become obvious what happens when you don’t reduce your values. Consider the following map result and reduce function. This time we want to get a list of all the unique labels in our view: "abc", "cef", "fhi", "hkl", "ino", "lqr", "mtu", "owx", "qza", "tdx", "xfg", "zul", "afrikan" "afrikan" "chinese" "chinese" "chinese" "chinese" "french" "italian" "italian" "spanish" "vietnamese" "vietnamese" We don’t care for the key here and only list all the labels we have. Our reduce function removes duplicates; see Example 6-3. Example 6-3. Don’t use this, it’s an example broken on purpose function(keys, values, rereduce) { var unique_labels = {}; values.forEach(function(label) { if(!unique_labels[label]) { unique_labels[label] = true; } }); } return unique_labels; This translates to Figure 6-4. Reduce/Rereduce | 63 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com We hope you get the picture. The way the B-tree storage works means that if you don’t actually reduce your data in the reduce function, you end up having CouchDB copy huge amounts of data around that grow linearly, if not faster with the number of rows in your view. CouchDB will be able to compute the final result, but only for views with a few rows. Anything larger will experience a ridiculously slow view build time. To help with that, CouchDB since version 0.10.0 will throw an error if your reduce function does not reduce its input values. See Chapter 21 for an example of how to compute unique lists with views. Figure 6-4. An overflowing reduce index Lessons Learned • If you don’t use the key field in the map function, you are probably doing it wrong. • If you are trying to make a list of values unique in the reduce functions, you are probably doing it wrong. • If you don’t reduce your values to a single scalar value or a small fixed-sized object or array with a fixed number of scalar values of small sizes, you are probably doing it wrong. Wrapping Up Map functions are side effect–free functions that take a document as argument and emit key/value pairs. CouchDB stores the emitted rows by constructing a sorted B-tree index, so row lookups by key, as well as streaming operations across a range of rows, can be accomplished in a small memory and processing footprint, while writes avoid seeks. Generating a view takes O(N), where N is the total number of rows in the view. However, 64 | Chapter 6: Finding Your Data with Views WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com querying a view is very quick, as the B-tree remains shallow even when it contains many, many keys. Reduce functions operate on the sorted rows emitted by map view functions. CouchDB’s reduce functionality takes advantage of one of the fundamental properties of B-tree indexes: for every leaf node (a sorted row), there is a chain of internal nodes reaching back to the root. Each leaf node in the B-tree carries a few rows (on the order of tens, depending on row size), and each internal node may link to a few leaf nodes or other internal nodes. The reduce function is run on every node in the tree in order to calculate the final reduce value. The end result is a reduce function that can be incrementally updated upon changes to the map function, while recalculating the reduction values for a minimum number of nodes. The initial reduction is calculated once per each node (inner and leaf) in the tree. When run on leaf nodes (which contain actual map rows), the reduce function’s third parameter, rereduce, is false. The arguments in this case are the keys and values as output by the map function. The function has a single returned reduction value, which is stored on the inner node that a working set of leaf nodes have in common, and is used as a cache in future reduce calculations. When the reduce function is run on inner nodes, the rereduce flag is true. This allows the function to account for the fact that it will be receiving its own prior output. When rereduce is true, the values passed to the function are intermediate reduction values as cached from previous calculations. When the tree is more than two levels deep, the rereduce phase is repeated, consuming chunks of the previous level’s output until the final reduce value is calculated at the root node. A common mistake new CouchDB users make is attempting to construct complex aggregate values with a reduce function. Full reductions should result in a scalar value, like 5, and not, for instance, a JSON hash with a set of unique keys and the count of each. The problem with this approach is that you’ll end up with a very large final value. The number of unique keys can be nearly as large as the number of total keys, even for a large set. It is fine to combine a few scalar calculations into one reduce function; for instance, to find the total, average, and standard deviation of a set of numbers in a single function. If you’re interested in pushing the edge of CouchDB’s incremental reduce functionality, have a look at Google’s paper on Sawzall, which gives examples of some of the more exotic reductions that can be accomplished in a system with similar constraints. Wrapping Up | 65 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 7 Validation Functions In this chapter, we look closely at the individual components of Sofa’s validation function. Sofa has the basic set of validation features you’ll want in your apps, so understanding its validation function will give you a good foundation for others you may write in the future. CouchDB uses the validate_doc_update function to prevent invalid or unauthorized document updates from proceeding. We use it in the example application to ensure that blog posts can be authored only by logged-in users. CouchDB’s validation functions—like map and reduce functions—can’t have any side effects; they run in isolation of a request. They have the opportunity to block not only end-user document saves, but also replicated documents from other CouchDBs. Document Validation Functions To ensure that users may save only documents that provide these fields, we can validate their input by adding another member to the _design/ document: the validate_doc_update function. This is the first time you’ve seen CouchDB’s external process in action. CouchDB sends functions and documents to a JavaScript interpreter. This mechanism is what allows us to write our document validation functions in JavaScript. The validate_doc_update function gets executed for each document you want to create or update. If the validation function raises an exception, the update is denied; when it doesn’t, the updates are accepted. Document validation is optional. If you don’t create a validation function, no checking is done and documents with any content or structure can be written into your CouchDB database. If you have multiple design documents, each with a validate_doc_update function, all of those functions are called upon each incoming write request. Only if all of them pass does the write succeed. The order of the validation execution is not defined. Each validation function must act on its own. See Figure 7-1. 67 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 7-1. The JavaScript document validation function Validation functions can cancel document updates by throwing errors. To throw an error in such a way that the user will be asked to authenticate, before retrying the request, use JavaScript code like: throw({unauthorized : message}); When you’re trying to prevent an authorized user from saving invalid data, use this: throw({forbidden : message}); This function throws forbidden errors when a post does not contain the necessary fields. In places it uses a validate() helper to clean up the JavaScript. We also use simple JavaScript conditionals to ensure that the doc._id is set to be the same as doc.slug for the sake of pretty URLs. 68 | Chapter 7: Validation Functions WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com If no exceptions are thrown, CouchDB expects the incoming document to be valid and will write it to the database. By using JavaScript to validate JSON documents, we can deal with any structure a document might have. Given that you can just make up document structure as you go, being able to validate what you come up with is pretty flexible and powerful. Validation can also be a valuable form of documentation. Validation’s Context Before we delve into the details of our validation function, let’s talk about the context in which they run and the effects they can have. Validation functions are stored in design documents under the validate_doc_update field. There is only one per design document, but there can be many design documents in a database. In order for a document to be saved, it must pass validations on all design documents in the database (the order in which multiple validations are executed is left undefined). In this chapter, we’ll assume you are working in a database with only one validation function. Writing One The function declaration is simple. It takes three arguments: the proposed document update, the current version of the document on disk, and an object corresponding to the user initiating the request. function(newDoc, oldDoc, userCtx) {} Above is the simplest possible validation function, which, when deployed, would allow all updates regardless of content or user roles. The converse, which never lets anyone do anything, looks like this: function(newDoc, oldDoc, userCtx) { throw({forbidden : 'no way'}); } Note that if you install this function in your database, you won’t be able to perform any other document operations until you remove it from the design document or delete the design document. Admins can create and delete design documents despite the existence of this extreme validation function. We can see from these examples that the return value of the function is ignored. Validation functions prevent document updates by raising errors. When the validation function passes without raising errors, the update is allowed to proceed. Type The most basic use of validation functions is to ensure that documents are properly formed to fit your application’s expectations. Without validation, you need to check Writing One | 69 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com for the existence of all fields on a document that your MapReduce or user-interface code needs to function. With validation, you know that any saved documents meet whatever criteria you require. A common pattern in most languages, frameworks, and databases is using types to distinguish between subsets of your data. For instance, in Sofa we have a few document types, most prominently post and comment. CouchDB itself has no notion of types, but they are a convenient shorthand for use in your application code, including MapReduce views, display logic, and user interface code. The convention is to use a field called type to store document types, but many frameworks use other fields, as CouchDB itself doesn’t care which field you use. (For instance, the CouchRest Ruby client uses couchrest-type). Here’s an example validation function that runs only on posts: function(newDoc, oldDoc, userCtx) { if (newDoc.type == "post") { // validation logic goes here } } Since CouchDB stores only one validation function per design document, you’ll end up validating multiple types in one function, so the overall structure becomes something like: function(newDoc, oldDoc, userCtx) { if (newDoc.type == "post") { // validation logic for posts } if (newDoc.type == "comment") { // validation logic for comments } if (newDoc.type == "unicorn") { // validation logic for unicorns } } It bears repeating that type is a completely optional field. We present it here as a helpful technique for managing validations in CouchDB, but there are other ways to write validation functions. Here’s an example that uses duck typing instead of an explicit type attribute: function(newDoc, oldDoc, userCtx) { if (newDoc.title && newDoc.body) { // validate that the document has an author } } This validation function ignores the type attribute altogether and instead makes the somewhat simpler requirement that any document with both a title and a body must have an author. For some applications, typeless validations are simpler. For others, it can be a pain to keep track of which sets of fields are dependent on one another. 70 | Chapter 7: Validation Functions WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com In practice, many applications end up using a mix of typed and untyped validations. For instance, Sofa uses document types to track which fields are required on a given document, but it also uses duck typing to validate the structure of particular named fields. We don’t care what sort of document we’re validating. If the document has a created_at field, we ensure that the field is a properly formed timestamp. Similarly, when we validate the author of a document, we don’t care what type of document it is; we just ensure that the author matches the user who saved the document. Required Fields The most fundamental validation is ensuring that particular fields are available on a document. The proper use of required fields can make writing MapReduce views much simpler, as you don’t have to test for all the properties before using them—you know all documents will be well-formed. Required fields also make display logic much simpler. Nothing says amateur like the word undefined showing up throughout your application. If you know for certain that all documents will have a field, you can avoid lengthy conditional statements to render the display differently depending on document structure. Sofa requires a different set of fields on posts and comments. Here’s a subset of the Sofa validation function: function(newDoc, oldDoc, userCtx) { function require(field, message) { message = message || "Document must have a " + field; if (!newDoc[field]) throw({forbidden : message}); }; } if (newDoc.type == "post") { require("title"); require("created_at"); require("body"); require("author"); } if (newDoc.type == "comment") { require("name"); require("created_at"); require("comment", "You may not leave an empty comment"); } This is our first look at actual validation logic. You can see that the actual error throwing code has been wrapped in a helper function. Helpers like the require function just shown go a long way toward making your code clean and readable. The require function is simple. It takes a field name and an optional message, and it ensures that the field is not empty or blank. Once we’ve declared our helper function, we can simply use it in a type-specific way. Posts require a title, a timestamp, a body, and an author. Comments require a name, a Writing One | 71 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com timestamp, and the comment itself. If we wanted to require that every single document contained a created_at field, we could move that declaration outside of any type con- ditional logic. Timestamps Timestamps are an interesting problem in validation functions. Because validation functions are run at replication time as well as during normal client access, we can’t require that timestamps be set close to the server’s system time. We can require two things: that timestamps do not change after they are initially set, and that they are well formed. What it means to be well formed depends on your application. We’ll look at Sofa’s particular requirements here, as well as digress a bit about other options for timestamp formats. First, let’s look at a validation helper that does not allow fields, once set, to be changed on subsequent updates: function(newDoc, oldDoc, userCtx) { function unchanged(field) { if (oldDoc && toJSON(oldDoc[field]) != toJSON(newDoc[field])) throw({forbidden : "Field can't be changed: " + field}); } unchanged("created_at"); } The unchanged helper is a little more complex than the require helper, but not much. The first line of the function prevents it from running on initial updates. The unchanged helper doesn’t care at all what goes into a field the first time it is saved. However, if there exists an already-saved version of the document, the unchanged helper requires that whatever fields it is used on are the same between the new and the old version of the document. JavaScript’s equality test is not well suited to working with deeply nested objects. We use CouchDB’s JavaScript runtime’s built-in toJSON function in our equality test, which is better than testing for raw equality. Here’s why: js> [] == [] false JavaScript considers these arrays to be different because it doesn’t look at the contents of the array when making the decision. Since they are distinct objects, JavaScript must consider them not equal. We use the toJSON function to convert objects to a string representation, which makes comparisons more likely to succeed in the case where two objects have the same contents. This is not guaranteed to work for deeply nested objects, as toJSON may serialize objects. 72 | Chapter 7: Validation Functions WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com The js command gets installed when you install CouchDB’s SpiderMonkey dependency. It is a command-line application that lets you parse, evaluate, and run JavaScript code. js lets you quickly test JavaScript code snippets like the one previously shown. You can also run a syntax check of your JavaScript code using js file.js. In case CouchDB’s error messages are not helpful, you can resort to testing your code standalone and get a useful error report. Authorship Authorship is an interesting question in distributed systems. In some environments, you can trust the server to ascribe authorship to a document. Currently, CouchDB has a simple built-in validation system that manages node admins. There are plans to add a database admin role, as well as other roles. The authentication system is pluggable, so you can integrate with existing services to authenticate users to CouchDB using an HTTP layer, using LDAP integration, or through other means. Sofa uses the built-in node admin account system and so is best suited for single or small groups of authors. Extending Sofa to store author credentials in CouchDB itself is an exercise left to the reader. Sofa’s validation logic says that documents saved with an author field must be saved by the author listed on that field: function(newDoc, oldDoc, userCtx) { if (newDoc.author) { enforce(newDoc.author == userCtx.name, "You may only update documents with author " + userCtx.name); } } Wrapping Up Validation functions are a powerful tool to ensure that only documents you expect end up in your databases. You can test writes to your database by content, by structure, and by user who is making the document request. Together, these three angles let you build sophisticated validation routines that will stop anyone from tampering with your database. Of course, validation functions are no substitute for a full security system, although they go a long way and work well with CouchDB’s other security mechanisms. Read more about CouchDB’s security in Chapter 22. Wrapping Up | 73 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 8 Show Functions CouchDB’s JSON documents are great for programmatic access in most environments. Almost all languages have HTTP and JSON libraries, and in the unlikely event that yours doesn’t, writing them is fairly simple. However, there is one important use case that JSON documents don’t cover: building plain old HTML web pages. Browsers are powerful, and it’s exciting that we can build Ajax applications using only CouchDB’s JSON and HTTP APIs, but this approach is not appropriate for most public-facing websites. HTML is the lingua franca of the web, for good reasons. By rendering our JSON documents into HTML pages, we make them available and accessible for a wider variety of uses. With the pure Ajax approach, visually impaired visitors to our blog stand a chance of not seeing any useful content at all, as popular screen-reading browsers have a hard time making sense of pages when the content is changed on the fly via JavaScript. Another important concern for authors is that their writing be indexed by search engines. Maintaining a high-quality blog doesn’t do much good if readers can’t find it via a web search. Most search engines do not execute JavaScript found within a page, so to them an Ajax blog looks devoid of content. We also mustn’t forget that HTML is likely more friendly as an archive format in the long term than the platform-specific JavaScript and JSON approach we used in previous chapters. Also, by serving plain HTML, we make our site snappier, as the browser can render meaningful content with fewer round-trips to the server. These are just a few of the reasons it makes sense to provide web content as HTML. The traditional way to accomplish the goal of rendering HTML from database records is by using a middle-tier application server, such as Ruby on Rails or Django, which loads the appropriate records for a user request, runs a template function using them, and returns the resulting HTML to the visitor’s browser. The basics of this don’t change in CouchDB’s case; wrapping JSON views and documents with an application server is relatively straightforward. Rather than using browser-side JavaScript to load JSON from CouchDB and rendering dynamic pages, Rails or Django (or your framework of choice) could make those same HTTP requests against CouchDB, render the output to HTML, and return it to the browser. We won’t cover this approach in this book, as 75 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com it is specific to particular languages and frameworks, and surveying the different options would take more space than you want to read. CouchDB includes functionality designed to make it possible to do most of what an application tier would do, without relying on additional software. The appeal of this approach is that CouchDB can serve the whole application without dependencies on a complex environment such as might be maintained on a production web server. Because CouchDB is designed to run on client computers, where the environment is out of the control of application developers, having some built-in templating capabilities greatly expands the potential uses of these applications. When your application can be served by a standard CouchDB instance, you gain deployment ease and flexibility. The Show Function API Show functions, as they are called, have a constrained API designed to ensure cacheability and side effect–free operation. This is in stark contrast to other application servers, which give the programmer the freedom to run any operation as the result of any request. Let’s look at a few example show functions. The most basic show function looks something like this: function(doc, req) { return 'We start with just a raw HTML document, containing a normal HTML form. We use JavaScript to convert user input into a JSON document and save it to CouchDB. In the spirit of focusing on CouchDB, we won’t dwell on the JavaScript here. It’s a combination of Sofa-specific application code, CouchApp’s JavaScript helpers, and jQuery for interface elements. The basic story is that it watches for the user to click “Save,” and then applies some callbacks to the document before sending it to CouchDB. 124 | Chapter 12: Storing Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 12-2. HTML listing for edit.html Saving a Document The JavaScript that drives blog post creation and editing centers around the HTML form from Figure 12-2. The CouchApp jQuery plug-in provides some abstraction, so we don’t have to concern ourselves with the details of how the form is converted to a JSON document when the user hits the submit button. $.CouchApp also ensures that the user is logged in and makes her information available to the application. See Figure 12-3. $.CouchApp(function(app) { app.loggedInNow(function(login) { The first thing we do is ask the CouchApp library to make sure the user is logged in. Assuming the answer is yes, we’ll proceed to set up the page as an editor. This means we apply a JavaScript event handler to the form and specify callbacks we’d like to run on the document, both when it is loaded and when it saved. Saving a Document | 125 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 12-3. JavaScript callbacks for edit.html // w00t, we're logged in (according to the cookie) $("#header").prepend(''+login+''); // setup CouchApp document/form system, adding app-specific callbacks var B = new Blog(app); Now that we know the user is logged in, we can render his username at the top of the page. The variable B is just a shortcut to some of the Sofa-specific blog rendering code. It contains methods for converting blog post bodies from Markdown to HTML, as well as a few other odds and ends. We pulled these functions into blog.js so we could keep them out of the way of main code. var postForm = app.docForm("form#new-post", { id : <%= docid %>, fields : ["title", "body", "tags"], template : { 126 | Chapter 12: Storing Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com type : "post", format : "markdown", author : login }, CouchApp’s app.docForm() helper is a function to set up and maintain a correspondence between a CouchDB document and an HTML form. Let’s look at the first three arguments passed to it by Sofa. The id argument tells docForm() where to save the document. This can be null in the case of a new document. We set fields to an array of form elements that will correspond directly to JSON fields in the CouchDB document. Finally, the template argument is given a JavaScript object that will be used as the starting point, in the case of a new document. In this case, we ensure that the document has a type equal to “post,” and that the default format is Markdown. We also set the author to be the login name of the current user. onLoad : function(doc) { if (doc._id) { B.editing(doc._id); $('h1').html('Editing '+doc._id+''); $('#preview').before(' '); $("#delete").click(function() { postForm.deleteDoc({ success: function(resp) { $("h1").text("Deleted "+resp.id); $('form#new-post input').attr('disabled', true); } }); return false; }); } $('label[for=body]').append(' with '+(doc.format||'html')+''); The onLoad callback is run when the document is loaded from CouchDB. It is useful for decorating the document before passing it to the form, or for setting up other user interface elements. In this case, we check to see if the document already has an ID. If it does, that means it’s been saved, so we create a button for deleting it and set up the callback to the delete function. It may look like a lot of code, but it’s pretty standard for Ajax applications. If there is one criticism to make of this section, it’s that the logic for creating the delete button could be moved to the blog.js file so we can keep more user-interface details out of the main flow. }, beforeSave : function(doc) { doc.html = B.formatBody(doc.body, doc.format); if (!doc.created_at) { doc.created_at = new Date(); } if (!doc.slug) { doc.slug = app.slugifyString(doc.title); doc._id = doc.slug; } Saving a Document | 127 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com if(doc.tags) { doc.tags = doc.tags.split(","); for(var idx in doc.tags) { doc.tags[idx] = $.trim(doc.tags[idx]); } } }, The beforeSave() callback to docForm is run after the user clicks the submit button. In Sofa’s case, it manages setting the blog post’s timestamp, transforming the title into an acceptable document ID (for prettier URLs), and processing the document tags from a string into an array. It also runs the Markdown-to-HTML conversion in the browser so that once the document is saved, the rest of the application has direct access to the HTML. success : function(resp) { $("#saved").text("Saved _rev: "+resp.rev).fadeIn(500).fadeOut(3000); B.editing(resp.id); } }); The last callback we use in Sofa is the success callback. It is fired when the document is successfully saved. In our case, we use it to flash a message to the user that lets her know she’s succeeded, as well as to add a link to the blog post so that when you create a blog post for the first time, you can click through to see its permalink page. That’s it for the docForm() callbacks. $("#preview").click(function() { var doc = postForm.localDoc(); var html = B.formatBody(doc.body, doc.format); $('#show-preview').html(html); // scroll down $('body').scrollTo('#show-preview', {duration: 500}); }); Sofa has a function to preview blog posts before saving them. Since this doesn’t affect how the document is saved, the code that watches for events from the “preview” button is not applied within the docForm() callbacks. }, function() { app.go('<%= assets %>/account.html#'+document.location); }); }); The last bit of code here is triggered when the user is not logged in. All it does is redirect him to the account page so that he can log in and try editing again. Validation Hopefully, you can see how the previous code will send a JSON document to CouchDB when the user clicks save. That’s great for creating a user interface, but it does nothing to protect the database from unwanted updates. This is where validation functions 128 | Chapter 12: Storing Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com come into play. With a proper validation function, even a determined hacker cannot get unwanted documents into your database. Let’s look at how Sofa’s works. For more on validation functions, see Chapter 7. function (newDoc, oldDoc, userCtx) { // !code lib/validate.js This line imports a library from Sofa that makes the rest of the function much more readable. It is just a wrapper around the basic ability to mark requests as either forbidden or unauthorized. In this chapter, we’ve concentrated on the business logic of the validation function. Just be aware that unless you use Sofa’s validate.js, you’ll need to work with the more primitive logic that the library abstracts. unchanged("type"); unchanged("author"); unchanged("created_at"); These lines do just what they say. If the document’s type, author, or created_at fields are changed, they throw an error saying the update is forbidden. Note that these lines make no assumptions about the content of these fields. They merely state that updates must not change the content from one revision of the document to the next. if (newDoc.created_at) dateFormat("created_at"); The dateFormat helper makes sure that the date (if one is provided) is in the format that Sofa’s views expect. // docs with authors can only be saved by their author // admin can author anything... if (!isAdmin(userCtx) && newDoc.author && newDoc.author != userCtx.name) { unauthorized("Only "+newDoc.author+" may edit this document."); } If the person saving the document is an admin, let the edit proceed. Otherwise, make certain that the author and the person saving the document are the same. This ensures that authors may edit only their own posts. // authors and admins can always delete if (newDoc._deleted) return true; The next block of code will check the validity of various types of documents. However, deletions will normally not be valid according to those specifications, because their content is just _deleted: true, so we short-circut the validation function here. } if (newDoc.type == 'post') { require("created_at", "author", "body", "html", "format", "title", "slug"); assert(newDoc.slug == newDoc._id, "Post slugs must be used as the _id.") } Finally, we have the validation for the actual post document itself. Here we require the fields that are particular to the post document. Because we’ve validated that they are present, we can count on them in views and user interface code. Saving a Document | 129 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Save Your First Post Let’s see how this all works together! Fill out the form with some practice data, and hit “save” to see a success response. Figure 12-4 shows how JavaScript has used HTTP to PUT the document to a URL constructed of the database name plus the document ID. It also shows how the document is just sent as a JSON string in the body of the PUT request. If you were to GET the document URL, you’d see the same set of JSON data, with the addition of the _rev parameter as applied by CouchDB. Figure 12-4. JSON over HTTP to save the blog post To see the JSON version of the document you’ve saved, you can also browse to it in Futon. Visit http://127.0.0.1:5984/_utils/database.html?blog/_all_docs and you should see a document with an ID corresponding to the one you just saved. Click it to see what Sofa is sending to CouchDB. Wrapping Up We’ve covered how to design JSON formats for your application, how to enforce those designs with validation functions, and the basics of how documents are saved. In the next chapter, we’ll show how to load documents from CouchDB and display them in the browser. 130 | Chapter 12: Storing Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 13 Showing Documents in Custom Formats CouchDB’s show functions are a RESTful API inspired by a similar feature in Lotus Notes. In a nutshell, they allow you to serve documents to clients, in any format you choose. A show function builds an HTTP response with any Content-Type, based on a stored JSON document. For Sofa, we’ll use them to show the blog post permalink pages. This will ensure that these pages are indexable by search engines, as well as make the pages more accessible. Sofa’s show function displays each blog post as an HTML page, with links to stylesheets and other assets, which are stored as attachments to Sofa’s design document. Hey, this is great—we’ve rendered a blog post! See Figure 13-1. Figure 13-1. A rendered post 131 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com The complete show function and template will render a static, cacheable resource that does not depend on details about the current user or anything else aside from the requested document and Content-Type. Generating HTML from a show function will not cause any side effects in the database, which has positive implications for building simple scalable applications. Rendering Documents with Show Functions Let’s look at the source code. The first thing we’ll see is the JavaScript function body, which is very simple—it simply runs a template function to generate the HTML page. Let’s break it down: function(doc, req) { // !json templates.post // !json blog // !code vendor/couchapp/template.js // !code vendor/couchapp/path.js We’re familiar with the !code and !json macros from Chapter 12. In this case, we’re using them to import a template and some metadata about the blog (as JSON data), as well as to include link and template rendering functions as inline code. Next, we render the template: return template(templates.post, { title : doc.title, blogName : blog.title, post : doc.html, date : doc.created_at, author : doc.author, The blog post title, HTML body, author, and date are taken from the document, with the blog’s title included from its JSON value. The next three calls all use the path.js library to generate links based on the request path. This ensures that links within the application are correct. } assets : assetPath(), editPostPath : showPath('edit', doc._id), index : listPath('index','recent-posts',{descending:true, limit:5}) }); So we’ve seen that the function body itself just calculates some values (based on the document, the request, and some deployment specifics, like the name of the database) to send to the template for rendering. The real action is in the HTML template. Let’s take a look. 132 | Chapter 13: Showing Documents in Custom Formats WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com The Post Page Template The template defines the output HTML, with the exception of a few tags that are replaced with dynamic content. In Sofa’s case, the dynamic tags look like <%= replace_me %>, which is a common templating tag delimiter. The template engine used by Sofa is adapted from John Resig’s blog post, “JavaScript Micro-Templating”. It was chosen as the simplest one that worked in the server-side context without modification. Using a different template engine would be a simple exercise. Let’s look at the template string. Remember that it is included in the JavaScript using the CouchApp !json macro, so that CouchApp can handle escaping it and including it to be used by the templating engine.' + doc.title + '
'; } When run with a document that has a field called title with the content “Hello World,” this function will send an HTTP response with the default Content-Type of text/ html, the UTF-8 character encoding, and the bodyHello World
. The simplicity of the request/response cycle of a show function is hard to overstate. The most common question we hear is, “How can I load another document so that I can render its content as well?” The short answer is that you can’t. The longer answer is that for some applications you might use a list function to render a view result as HTML, which gives you the opportunity to use more than one document as the input of your function. The basic function from a document and a request to a response, with no side effects and no alternative inputs, stays the same even as we start using more advanced features. Here’s a more complex show function illustrating the ability to set custom headers: function(doc, req) { return { body : '' + doc.title + ' ', headers : { "Content-Type" : "application/xml", "X-My-Own-Header": "you can set your own headers" } } } 76 | Chapter 8: Show Functions WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com If this function were called with the same document as we used in the previous example, the response would have a Content-Type of application/xml and the bodyHello World . You should be able to see from this how you’d be able to use show functions to generate any output you need, from any of your documents. Popular uses of show functions are for outputting HTML page, CSV files, or XML needed for compatibility with a particular interface. The CouchDB test suite even illustrates using show functions to output a PNG image. To output binary data, there is the option to return a Base64-encoded string, like this: function(doc, req) { return { base64 : ["iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAMAAAAoLQ9TAAAAsV", "BMVEUAAAD////////////////////////5ur3rEBn////////////////wDBL/", "AADuBAe9EB3IEBz/7+//X1/qBQn2AgP/f3/ilpzsDxfpChDtDhXeCA76AQH/v7", "/84eLyWV/uc3bJPEf/Dw/uw8bRWmP1h4zxSlD6YGHuQ0f6g4XyQkXvCA36MDH6", "wMH/z8/yAwX64ODeh47BHiv/Ly/20dLQLTj98PDXWmP/Pz//39/wGyJ7Iy9JAA", "AADHRSTlMAbw8vf08/bz+Pv19jK/W3AAAAg0lEQVR4Xp3LRQ4DQRBD0QqTm4Y5", "zMxw/4OleiJlHeUtv2X6RbNO1Uqj9g0RMCuQO0vBIg4vMFeOpCWIWmDOw82fZx", "vaND1c8OG4vrdOqD8YwgpDYDxRgkSm5rwu0nQVBJuMg++pLXZyr5jnc1BaH4GT", "LvEliY253nA3pVhQqdPt0f/erJkMGMB8xucAAAAASUVORK5CYII="].join(''), headers : { "Content-Type" : "image/png" } }; } This function outputs a 16×16 pixel version of the CouchDB logo. The JavaScript code necessary to generate images from document contents would likely be quite complex, but the ability to send Base64-encoded binary data means that query servers written in other languages like C or PHP have the ability to output any data type. Side Effect–Free We’ve mentioned that a key constraint of show functions is that they are side effect– free. This means that you can’t use them to update documents, kick off background processes, or trigger any other function. In the big picture, this is a good thing, as it allows CouchDB to give performance and reliability guarantees that standard web frameworks can’t. Because a show function will always return the same result given the same input and can’t change anything about the environment in which it runs, its output can be cached and intelligently reused. In a high-availability deployment with proper caching, this means that a given show function will be called only once for any particular document, and the CouchDB server may not even be contacted for subsequent requests. Working without side effects can be a little bit disorienting for developers who are used to the anything-goes approach offered by most application servers. It’s considered best practice to ensure that actions run in response to GET requests are side effect–free and Side Effect–Free | 77 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com cacheable, but rarely do we have the discipline to achieve that goal. CouchDB takes a different tack: because it’s a database, not an application server, we think it’s more important to enforce best practices (and ensure that developers don’t write functions that adversely effect the database server) than offer absolute flexibility. Once you’re used to working within these constraints, they start to make a lot of sense. (There’s a reason they are considered best practices.) Design Documents Before we look into show functions themselves, we’ll quickly review how they are stored in design documents. CouchDB looks for show functions stored in a top-level field called shows, which is named like this to be parallel with views, lists, and filters. Here’s an example design document that defines two show functions: { } "_id" : "_design/show-function-examples", "shows" : { "summary" : "function(doc, req){ ... }", "detail" : "function(doc, req){ ... }" } There’s not much to note here except the fact that design documents can define multiple show functions. Now let’s see how these functions are run. Querying Show Functions We’ve described the show function API, but we haven’t yet seen how these functions are run. The show function lives inside a design document, so to invoke it we append the name of the function to the design document itself, and then the ID of the document we want to render: GET /mydb/_design/mydesign/_show/myshow/72d43a93eb74b5f2 Because show functions (and the others like list, etc.) are available as resources within the design document path, all resources provided by a particular design document can be found under a common root, which makes custom application proxying simpler. We’ll see an example of this in Part III. If the document with ID 72d43a93eb74b5f2 does not exist, the request will result in an HTTP 500 Internal Server Error response. This seems a little harsh; why does it happen? If we query a show function with a document ID that doesn’t point to an existing document, the doc argument in the function is null. Then the show function tries to access it, and the JavaScript interpreter doesn’t like that. So it bails out. To secure against these errors, or to handle non-existing documents in a custom way (e.g., a wiki 78 | Chapter 8: Show Functions WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com could display a “create new page” page), you can wrap the code in our function with if(doc !== null) { ... }. However, show functions can also be called without a document ID at all, like this: GET /mydb/_design/mydesign/_show/myshow In this case, the doc argument to the function has the value null. This option is useful in cases where the show function can make sense without a document. For instance, in the example application we’ll explore in Part III, we use the same show function to provide for editing existing blog posts when a DocID is given, as well as for composing new blog posts when no DocID is given. The alternative would be to maintain an alternate resource (likely a static HTML attachment) with parallel functionality. As programmers, we strive not to repeat ourselves, which motivated us to give show functions the ability to run without a document ID. Design Document Resources In addition to the ability to run show functions, other resources are available within the design document path. This combination of features within the design document resource means that applications can be deployed without exposing the full CouchDB API to visitors, with only a simple proxy to rewrite the paths. We won’t go into full detail here, but the gist of it is that end users would run the previous query from a path like this: GET /_show/myshow/72d43a93eb74b5f2 Under the covers, an HTTP proxy can be programmed to prepend the database and design document portion of the path (in this case, /mydb/_design/mydesign) so that CouchDB sees the standard query. With such a system in place, end users can access the application only via functions defined on the design document, so developers can enforce constraints and prevent access to raw JSON document and view data. While it doesn’t provide 100% security, using custom rewrite rules is an effective way to control the access end users have to a CouchDB application. This technique has been used in production by a few websites at the time of this writing. Query Parameters The request object (including helpfully parsed versions of query parameters) is available to show functions as well. By way of illustration, here’s a show function that returns different data based on the URL query parameters: function(req, doc) { return "Aye aye, " + req.parrot + "!
"; } Requesting this function with a query parameter will result in the query parameter being used in the output: Querying Show Functions | 79 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com GET /mydb/_design/mydesign/_show/myshow?parrot=Captain In this case, we’ll see the output:Aye aye, Captain!
Allowing URL parameters into the function does not affect cacheability, as each unique invocation results in a distinct URL. However, making heavy use of this feature will lower your cache effectiveness. Query parameters like this are most useful for doing things like switching the mode or the format of the show function output. It’s recommended that you avoid using them for things like inserting custom content (such as requesting the user’s nickname) into the response, as that will mean each users’s data must be cached separately. Accept Headers Part of the HTTP spec allows for clients to give hints to the server about which media types they are capable of accepting. At this time, the JavaScript query server shipped with CouchDB 0.10.0 contains helpers for working with Accept headers. However, web browser support for Accept headers is very poor, which has prompted frameworks such as Ruby on Rails to remove their support for them. CouchDB may or may not follow suit here, but the fact remains that you are discouraged from relying on Accept headers for applications that will be accessed via web browsers. There is a suite of helpers for Accept headers present that allow you to specify the format in a query parameter as well. For instance: GET /db/_design/app/_show/post Accept: application/xml is equivalent to a similar URL with mismatched Accept headers. This is because browsers don’t use sensible Accept headers for feed URLs. Browsers 1, Accept headers 0. Yay browsers. GET /db/_design/app/_show/post?format=xml Accept: x-foo/whatever The request function allows developers to switch response Content-Types based on the client’s request. The next example adds the ability to return either HTML, XML, or a developer-designated media type: x-foo/whatever. CouchDB’s main.js library provides the ("format", render_function) function, which makes it easy for developers to handle client requests for multiple MIME types in one form function. This function also shows off the use of registerType(name, mime_types), which adds new types to mapping objects used by respondWith. The end result is ultimate flexibility for developers, with an easy interface for handling different types of requests. main.js uses a JavaScript port of Mimeparse, an open source reference implementation, to provide this service. 80 | Chapter 8: Show Functions WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Etags We’ve mentioned that show function requests are side effect–free and cacheable, but we haven’t discussed the mechanism used to accomplish this. Etags are a standard HTTP mechanism for indicating whether a cached copy of an HTTP response is still current. Essentially, when the client makes its first request to a resource, the response is accompanied by an Etag, which is an opaque string token unique to the version of the resource requested. The second time the client makes a request against the same resource, it sends along the original Etag with the request. If the server determines that the Etag still matches the resource, it can avoid sending the full response, instead replying with a message that essentially says, “You have the latest version already.” When implemented properly, the use of Etags can cut down significantly on server load. CouchDB provides an Etag header, so that by using an HTTP proxy cache like Squid, you’ll instantly remove load from CouchDB. Functions and Templates CouchDB’s process runner looks only at the functions stored under show, but we’ll want to keep the template HTML separate from the content negotiation logic. The couchapp script handles this for us, using the !code and !json handlers. Let’s follow the show function logic through the files that Sofa splits it into. Here’s Sofa’s edit show function: function(doc, req) { // !json templates.edit // !json blog // !code vendor/couchapp/path.js // !code vendor/couchapp/template.js } // we only show html return template(templates.edit, { doc : doc, docid : toJSON((doc && doc._id) || null), blog : blog, assets : assetPath(), index : listPath('index','recent-posts',{descending:true,limit:8}) }); This should look pretty straightforward. First, we have the function’s head, or signature, that tells us we are dealing with a function that takes two arguments: doc and req. The next four lines are comments, as far as JavaScript is concerned. But these are special documents. The CouchApp upload script knows how to read these special comments on top of the show function. They include macros; a macro starts with a bang ( !) and a name. Currently, CouchApp supports the two macros !json and !code. Functions and Templates | 81 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com The !json Macro The !json macro takes one argument: the path to a file in the CouchApp directory hierarchy in the dot notation. Instead of a slash (/) or backslash (\), you use a dot (.). The !json macro then reads the contents of the file and puts them into a variable that has the same name as the file’s path in dot notation. For example, if you use the macro like this: // !json template.edit CouchDB will read the file template/edit.* and place its contents into a variable: var template.edit = "contents of edit.*" When specifying the path, you omit the file’s extension. That way you can read .json, .js, or .html files, or any other files into variables in your functions. Because the macro matches files with any extensions, you can’t have two files with the same name but different extensions. In addition, you can specify a directory and CouchApp will load all the files in this directory and any subdirectory. So this: // !json template creates: var template.edit = "contents of edit.*" var teplate.post = "contents of post.*" Note that the macro also takes care of creating the top-level template variable. We just omitted that here for brevity. The !json macro will generate only valid JavaScript. The !code Macro The !code macro is similar to the !json macro, but it serves a slightly different purpose. Instead of making the contents of one or more files available as variables in your functions, it replaces itself with the contents of the file referenced in the argument to the macro. This is useful for sharing library functions between CouchDB functions (map/reduce/ show/list/validate) without having to maintain their source code in multiple places. Our example shows this line: // !code vendor/couchapp/path.js If you look at the CouchApp sources, there is a file in vendor/couchapp/path.js that includes a bunch of useful function related to the URL path of a request. In the example just shown, CouchApp will replace the line with the contents of path.js, making the functions locally available to the show function. The !code macro can load only a single file at a time. 82 | Chapter 8: Show Functions WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Learning Shows Before we dig into the full code that will render the post permalink pages, let’s look at some Hello World form examples. The first one shows just the function arguments and the simplest possible return value. See Figure 8-1. Figure 8-1. Basic form function A show function is a JavaScript function that converts a document and some details about the HTTP request into an HTTP response. Typically it will be used to construct HTML, but it is also capable of returning Atom feeds, images, or even just filtered JSON. The document argument is just like the documents passed to map functions. Using Templates The only thing missing from the show function development experience is the ability to render HTML without ruining your eyes looking at a whole lot of string manipulation, among other unpleasantries. Most programming environments solve this problem with templates; for example, documents that look like HTML but have portions of their content filled out dynamically. Dynamically combining template strings and data in JavaScript is a solved problem. However, it hasn’t caught on, partly because JavaScript doesn’t have very good support for multi-line “heredoc” strings. After all, once you get through escaping quotes and leaving out newlines, it’s not much fun to edit HTML templates inlined into JavaScript Using Templates | 83 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com code. We’d much rather keep our templates in separate files, where we can avoid all the escaping work, and they can be syntax-highlighted by our editor. The couchapp script has a couple of helpers to make working with templates and library code stored in design documents less painful. In the function shown in Figure 8-2, we use them to load a blog post template, as well as the JavaScript function responsible for rendering it. Figure 8-2. The blog post template As you can see, we take the opportunity in the function to strip JavaScript tags from the form post. That regular expression is not secure, and the blogging application is meant to be written to only by its owners, so we should probably drop the regular expression and simplify the function to avoid transforming the document, instead passing it directly to the template. Or we should port a known-good sanitization routine from another language and provide it in the templates library. 84 | Chapter 8: Show Functions WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Writing Templates Working with templates, instead of trying to cram all the presentation into one file, makes editing forms a little more relaxing. The templates are stored in their own file, so you don’t have to worry about JavaScript or JSON encoding, and your text editor can highlight the template’s HTML syntax. CouchDB’s JavaScript query server includes the E4X extensions for JavaScript, which can be helpful for XML templates but do not work well for HTML. We’ll explore E4X templates in Chapter 14 when we cover forms for views, which makes providing an Atom feed of view results easy and memory efficient. Trust us when we say that looking at this HTML page is much more relaxing than trying to understand what a raw JavaScript one is trying to do. The template library we’re using in the example blog is by John Resig and was chosen for simplicity. It could easily be replaced by one of many other options, such as the Django template language, available in JavaScript. This is a good time to note that CouchDB’s architecture is designed to make it simple to swap out languages for the query servers. With a query server written in Lisp, Python, or Ruby (or any language that supports JSON and stdio), you could have an even wider variety of templating options. However, the CouchDB team recommends sticking with JavaScript as it provides the highest level of support and interoperability, though other options are available. Writing Templates | 85 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 9 Transforming Views with List Functions Just as show functions convert documents to arbitrary output formats, CouchDB list functions allow you to render the output of view queries in any format. The powerful iterator API allows for flexibility to filter and aggregate rows on the fly, as well as output raw transformations for an easy way to make Atom feeds, HTML lists, CSV files, config files, or even just modified JSON. List functions are stored under the lists field of a design document. Here’s an example design document that contains two list functions: { } "_id" : "_design/foo", "_rev" : "1-67at7bg", "lists" : { "bar" : "function(head, req) { var row; while (row = getRow()) { ... } }", "zoom" : "function() { return 'zoom!' }", } Arguments to the List Function The function is called with two arguments, which can sometimes be ignored, as the row data itself is loaded during function execution. The first argument, head, contains information about the view. Here’s what you might see looking at a JSON representation of head: {total_rows:10, offset:0} The request itself is a much richer data structure. This is the same request object that is available to show, update, and filter functions. We’ll go through it in detail here as a reference. Here’s the example req object: Download at WoweBook.com WWW.EBOOK777.COM 87 www.it-ebooks.info free ebooks ==> www.ebook777.com { "info": { "db_name": "test_suite_db","doc_count": 11,"doc_del_count": 0, "update_seq": 11,"purge_seq": 0,"compact_running": false,"disk_size": 4930, "instance_start_time": "1250046852578425","disk_format_version": 4}, The database information, as available in an information request against a database’s URL, is included in the request parameters. This allows you to stamp rendered rows with an update sequence and know the database you are working with. "method": "GET", "path": ["test_suite_db","_design","lists","_list","basicJSON","basicView"], The HTTP method and the path in the client from the client request are useful, especially for rendering links to other resources within the application. "query": {"foo":"bar"}, If there are parameters in the query string (in this case corresponding to ?foo=bar), they will be parsed and available as a JSON object at req.query. "headers": {"Accept": "text/html,application/xhtml+xml ,application/xml;q=0.9,*/*;q=0.8", "Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.7","Accept-Encoding": "gzip,deflate","Accept-Language": "en-us,en;q=0.5","Connection": "keep-alive", "Cookie": "_x=95252s.sd25; AuthSession=","Host": "127.0.0.1:5984", "Keep-Alive": "300", "Referer": "http://127.0.0.1:5984/_utils/couch_tests.html?script/couch_tests.js", "User-Agent": "Mozilla/5.0 Gecko/20090729 Firefox/3.5.2"}, "cookie": {"_x": "95252s.sd25","AuthSession": ""}, Headers give list and show functions the ability to provide the Content-Type response that the client prefers, as well as other nifty things like cookies. Note that cookies are also parsed into a JSON representation. Thanks, MochiWeb! "body": "undefined", "form": {}, In the case where the method is POST, the request body (and a form-decoded JSON representation of it, if applicable) are available as well. } "userCtx": {"db": "test_suite_db","name": null,"roles": ["_admin"]} Finally, the userCtx is the same as that sent to the validation function. It provides access to the database the user is authenticated against, the user’s name, and the roles they’ve been granted. In the previous example, you see an anonymous user working with a CouchDB node that is in “admin party” mode. Unless an admin is specified, everyone is an admin. That’s enough about the arguments to list functions. Now it’s time to look at the mechanics of the function itself. 88 | Chapter 9: Transforming Views with List Functions WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com An Example List Function Let’s put this knowledge to use. In the chapter introduction, we mentioned using lists to generate config files. One fun thing about this is that if you keep your configuration information in CouchDB and generate it with lists, you don’t have to worry about being able to regenerate it again, because you know the config will be generated by a pure function from your database and not other sources of information. This level of isolation will ensure that your config files can be generated correctly as long as CouchDB is running. Because you can’t fetch data from other system services, files, or network sources, you can’t accidentally write a config file generator that fails due to external factors. J. Chris got excited about the idea of using list functions to generate config files for the sort of services people usually configure using CouchDB, specifically via Chef, an Apache-licensed infrastructure automation tool. The key feature of infrastructure automation is that deployment scripts are idempotent—that is, running your scripts multiple times will have the same intended effect as running them once, something that becomes critical when a script fails halfway through. This encourages crash-only design, where your scripts can bomb out multiple times but your data remains consistent, because it takes the guesswork out of provisioning and updating servers in the case of previous failures. Like map, reduce, and show functions, lists are pure functions, from a view query and an HTTP request to an output format. They can’t make queries against remote services or otherwise access outside data, so you know they are repeatable. Using a list function to generate an HTTP server configuration file ensures that the configuration is generated repeatably, based on only the state of the database. Imagine you are running a shared hosting platform, with one name-based virtual host per user. You’ll need a config file that starts out with some node configuration (which modules to use, etc.) and is followed by one config section per user, setting things like the user’s HTTP directory, subdomain, forwarded ports, etc. function(head, req) { // helper function definitions would be here... var row, userConf, configHeader, configFoot; configHeader = renderTopOfApacheConf(head, req.query.hostname); send(configHeader); In the first block of the function, we’re rendering the top of the config file using the function renderTopOfApacheConf(head, req.query.hostname). This may include information that’s posted into the function, like the internal name of the server that is being configured or the root directory in which user HTML files are organized. We won’t show the function body, but you can imagine that it would return a long multi-line An Example List Function | 89 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com string that handles all the global configuration for your server and sets the stage for the per-user configuration that will be based on view data. The call to send(configHeader) is the heart of your ability to render text using list functions. Put simply, it just sends an HTTP chunk to the client, with the content of the strings pasted to it. There is some batching behind the scenes, as CouchDB speaks with the JavaScript runner with a synchronous protocol, but from the perspective of a programmer, send() is how HTTP chunks are born. Now that we’ve rendered and sent the file’s head, it’s time to start rendering the list itself. Each list item will be the result of converting a view row to a virtual host’s configuration element. The first thing we do is call getRow() to get a row of the view. while (row = getRow()) { var userConf = renderUserConf(row); send(userConf) } The while loop used here will continue to run until getRow() returns null, which is how CouchDB signals to the list function that all valid rows (based on the view query parameters) have been exhausted. Before we get ahead of ourselves, let’s check out what happens when we do get a row. In this case, we simply render a string based on the row and send it to the client. Once all rows have been rendered, the loop is complete. Now is a good time to note that the function has the option to return early. Perhaps it is programmed to stop iterating when it sees a particular user’s document or is based on a tally it’s been keeping of some resource allocated in the configuration. In those cases, the loop can end early with a break statement or other method. There’s no requirement for the list function to render every row that is sent to it. } configFoot = renderConfTail(); return configFoot; Finally, we close out the configuration file and return the final string value to be sent as the last HTTP chunk. The last action of a list function is always to return a string, which will be sent as the final HTTP chunk to the client. To use our config file generation function in practice, we might run a command-line script that looks like: curl http://localhost:5984/config_db/_design/files/_list/apache/users?hostname=foobar > apache.conf This will render our Apache config based on data in the user’s view and save it to a file. What a simple way to build a reliable configuration generator! 90 | Chapter 9: Transforming Views with List Functions WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com List Theory Now that we’ve seen a complete list function, it’s worth mentioning some of the helpful properties they have. The most obvious thing is the iterator-style API. Because each row is loaded independently by calling getRow(), it’s easy not to leak memory. The list function API is capable of rendering lists of arbitrary length without error, when used correctly. On the other hand, this API gives you the flexibility to bundle a few rows in a single chunk of output, so if you had a view of, say, user accounts, followed by subdomains owned by that account, you could use a slightly more complex loop to build up some state in the list function for rendering more complex chunks. Let’s look at an alternate loop section: var subdomainOwnerRow, subdomainRows = []; while (row = getRow()) { We’ve entered a loop that will continue until we have reached the endkey of the view. The view is structured so that a user profile row is emitted, followed by all of that user’s subdomains. We’ll use the profile data and the subdomain information to template the configuration for each individual user. This means we can’t render any subdomain configuration until we know we’ve received all the rows for the current user. if (!subdomainOwnerRow) { subdomainOwnerRow = row; This case is true only for the first user. We’re merely setting up the initial conditions. } else if (row.value.user != subdomainOwnerRow.value.user) { This is the end case. It will be called only after all the subdomain rows for the current user have been exhausted. It is triggered by a row with a mismatched user, indicating that we have all the subdomain rows. send(renderUserConf(subdomainOwnerRow, subdomainRows)); We know we are ready to render everything for the current user, so we pass the profile row and the subdomain rows to a render function (which nicely hides all the gnarly nginx config details from our fair reader). The result is sent to the HTTP client, which writes it to the config file. subdomainRows = []; subdomainOwnerRow = row; We’ve finished with that user, so let’s clear the rows and start working on the next user. } else { subdomainRows.push(row); Ahh, back to work, collecting rows. List Theory | 91 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com } } send(renderUserConf(subdomainOwnerRow, subdomainRows)); This last bit is tricky—after the loop is finished (we’ve reached the end of the view query), we’ve still got to render the last user’s config. Wouldn’t want to forget that! The gist of this loop section is that we collect rows that belong to a particular user until we see a row that belongs to another user, at which point we render output for the first user, clear our state, and start working with the new user. Techniques like this show how much flexibility is allowed by the list iterator API. More uses along these lines include filtering rows that should be hidden from a particular result set, finding the top N grouped reduce values (e.g., to sort a tag cloud by popularity), and even writing custom reduce functions (as long as you don’t mind that reductions are not stored incrementally). Querying Lists We haven’t looked in detail at the ways list functions are queried. Just like show functions, they are resources available on the design document. The basic path to a list function is as follows: /db/_design/foo/_list/list-name/view-name Because the list name and the view name are both specified, this means it is possible to render a list against more than one view. For instance, you could have a list function that renders blog comments in the Atom XML format, and then run it against both a global view of recent comments as well as a view of recent comments by blog post. This would allow you to use the same list function to provide an Atom feed for comments across an entire site, as well as individual comment feeds for each post. After the path to the list comes the view query parameter. Just like a regular view, calling a list function without any query parameters results in a list that reflects every row in the view. Most of the time you’ll want to call it with query parameters to limit the returned data. You’re already familiar with the view query options from Chapter 6. The same query options apply to the _list query. Let’s look at URLs side by side; see Example 9-1. Example 9-1. A JSON view query GET /db/_design/sofa/_view/recent-posts?descending=true&limit=10 This view query is just asking for the 10 most recent blog posts. Of course, this query could include parameters like startkey or skip—we’re leaving them out for simplicity. To run the same query through a list function, we access it via the list resource, as shown in Example 9-2. 92 | Chapter 9: Transforming Views with List Functions WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Example 9-2. The HTML list query GET /db/_design/sofa/_list/index/recent-posts?descending=true&limit=10 The index list here is a function from JSON to HTML. Just like the preceding view query, additional query parameters can be applied to paginate through the list. As we’ll see in Part III, once you have a working list, adding pagination is trivial. See Example 9-3. Example 9-3. The Atom list query GET /db/_design/sofa/_list/index/recent-posts?descending=true&limit=10&format=atom The list function can also look at the query parameters and do things like switch that output to render based on parameters. You can even do things like pass the username into the list using a query parameter (but it’s not recommended, as you’ll ruin cache efficiency). Lists, Etags, and Caching Just like show functions and view queries, lists are sent with proper HTTP Etags, which makes them cacheable by intermediate proxies. This means that if your server is starting to bog down in list-rendering code, it should be possible to relieve load by using a caching reverse proxy like Squid. We won’t go into the details of Etags and caching here, as they were covered in Chapter 8. Lists, Etags, and Caching | 93 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com PART III Example Application WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 10 Standalone Applications CouchDB is useful for many areas of an application. Because of its incremental MapReduce and replication characteristics, it is especially well suited to online interactive document and data management tasks. These are the sort of workloads experienced by the majority of web applications. This coupled with CouchDB’s HTTP interface make it a natural fit for the web. In this part, we’ll tour a document-oriented web application—a basic blog implementation. As a lowest common denominator, we’ll be using plain old HTML and JavaScript. The lessons learned should apply to Django/Rails/Java-style middleware applications and even to intensive MapReduce data mining tasks. CouchDB’s API is the same, regardless of whether you’re running a small installation or an industrial cluster. There is no right answer about which application development framework you should use with CouchDB. We’ve seen successful applications in almost every commonly used language and framework. For this example application, we’ll use a two-layer architecture: CouchDB as the data layer and the browser for the user interface. We think this is a viable model for many document-oriented applications, and it makes a great way to teach CouchDB, because we can easily assume that all of you have a browser at hand without having to ensure that you’re familiar with a particular server-side scripting language. Use the Correct Version This part is interactive, so be prepared to follow along with your laptop and a running CouchDB database. We’ve made the full example application and all of the source code examples available online, so you’ll start by downloading the current version of the example application and installing it on your CouchDB instance. A challenge of writing this book and preparing it for production is that CouchDB is evolving at a rapid pace. The basics haven’t changed in a long time, and probably won’t change much in the future, but things around the edges are moving forward rapidly for CouchDB’s 1.0 release. 97 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com This book is going to press as CouchDB version 0.10.0 is about to be released. Most of the code was written against 0.9.1 and the development trunk that is becoming version 0.10.0. In this part we’ll work with two other software packages: CouchApp, which is a set of tools for editing and sharing CouchDB application code; and Sofa, the example blog itself. See http://couchapp.org for the latest information about the CouchApp model. As a reader, it is your responsibility to use the correct versions of these packages. For CouchApp, the correct version is always the latest. The correct version of Sofa depends on which version of CouchDB you are using. To see which version of CouchDB you are using, run the following command: curl http://127.0.0.1:5984 You should see something like one of these three examples: {"couchdb":"Welcome","version":"0.9.1"} {"couchdb":"Welcome","version":"0.10.0"} {"couchdb":"Welcome","version":"0.11.0a858744"} These three correspond to versions 0.9.1, 0.10.0, and trunk. If the version of CouchDB you have installed is 0.9.1 or earlier, you should upgrade to at least 0.10.0, as Sofa makes use of features not present until 0.10.0. There is an older version of Sofa that will work, but this book covers features and APIs that are part of the 0.10.0 release of CouchDB. It’s conceivable that there will be a 0.9.2, 0.10.1 and even a 0.10.2 release by the time you read this. Please use the latest release of whichever version you prefer. Trunk refers to the latest development version of CouchDB available in the Apache Subversion repository. We recommend that you use a released version of CouchDB, but as developers, we often use trunk. Sofa’s master branch will tend to work on trunk, so if you want to stay on the cutting edge, that’s the way to do it. Portable JavaScript If you’re not familiar with JavaScript, we hope the source examples are given with enough context and explanation so that you can keep up. If you are familiar with JavaScript, you’re probably already excited that CouchDB supports view and template rendering JavaScript functions. One of the advantages of building applications that can be hosted on any standard CouchDB installation is that they are portable via replication. This means your application, if you develop it to be served directly from CouchDB, gets offline mode “for 98 | Chapter 10: Standalone Applications WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com free.” Local data makes a big difference for users in a number of ways we won’t get into here. We call applications that can be hosted from a standard CouchDB CouchApps. CouchApps are a great vehicle for teaching CouchDB because we don’t need to worry about picking a language or framework; we’ll just work directly with CouchDB so that readers get a quick overview of a familiar application pattern. Once you’ve worked through the example app, you’ll have seen enough to know how to apply CouchDB to your problem domain. If you don’t know much about Ajax development, you’ll learn a little about jQuery as well, and we hope you find the experience relaxing. Applications Are Documents Applications are stored as design documents (Figure 10-1). You can replicate design documents just like everything else in CouchDB. Because design documents can be replicated, whole CouchApps are replicated. CouchApps can be updated via replication, but they are also easily “forked” by the users, who can alter the source code at will. Figure 10-1. CouchDB executes application code stored in design documents Because applications are just a special kind of document, they are easy to edit and share. J. Chris says: Thinking of peer-based application replication takes me back to my first year of high school, when my friends and I would share little programs between the TI-85 graphing calculators we were required to own. Two calculators could be connected via a small cable and we’d share physics cheat sheets, Hangman, some multi-player text-based adventures, and, at the height of our powers, I believe there may have been a Doom clone running. The TI-85 programs were in Basic, so everyone was always hacking each other’s hacks. Perhaps the most ridiculous program was a version of Spy Hunter that you controlled with your mind. The idea was that you could influence the pseudorandom number generator by concentrating hard enough, and thereby control the game. Didn’t work. Anyway, the point is that when you give people access to the source code, there’s no telling what might happen. Applications Are Documents | 99 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com If people don’t like the aesthetics of your application, they can tweak the CSS. If people don’t like your interface choices, they can improve the HTML. If they want to modify the functionality, they can edit the JavaScript. Taken to the extreme, they may want to completely fork your application for their own purposes. When they show the modified version to their friends and coworkers, and hopefully you, there is a chance that more people may want to make improvements. As the original developer, you have the control over your version and can accept or reject changes as you see fit. If someone messes around with the source code for a local application and breaks things beyond repair, they can replicate the original copy from your server, as illustrated in Figure 10-2. Figure 10-2. Replicating application changes to a group of friends Of course, this may not be your cup of tea. Don’t worry; you can be as restrictive as you like with CouchDB. You can restrict access to data however you wish, but beware of the opportunities you might be missing. There is a middle ground between open collaboration and restricted access controls. Once you’ve finished the installation procedure, you’ll be able to see the full application code for Sofa, both in your text editor and as a design document in Futon. Standalone What happens if you add an HTML file as a document attachment? Exactly the same thing. We can serve web pages directly with CouchDB. Of course, we might also need images, stylesheets, or scripts. No problem; just add these resources as document attachments and link to them using relative URIs. Let’s take a step back. What do we have so far? A way to serve HTML documents and other static files on the Web. That means we can build and serve traditional websites 100 | Chapter 10: Standalone Applications WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com using CouchDB. Fantastic! But isn’t this a little like reinventing the wheel? Well, a very important difference is that we also have a document database sitting in the background. We can talk to this database using the JavaScript served up with our web pages. Now we’re really cooking with gas! CouchDB’s features are a foundation for building standalone web applications backed by a powerful database. As a proof of concept, look no further than CouchDB’s builtin administrative interface. Futon is a fully functional database management application built using HTML, CSS, and JavaScript. Nothing else. CouchDB and web applications go hand in hand. In the Wild There are plenty of examples of CouchApps in the wild. This section includes screenshots of just a few sites and applications that use a standalone CouchDB architecture. Damien Katz, inventor of CouchDB and writer of this book’s Foreword, decided to see how long it would take to implement a shared calendar with real-time updates as events are changed on the server. It took about an afternoon, thanks to some amazing open source jQuery plug-ins. The calendar demo is still running on J. Chris’s server. See Figure 10-3. Figure 10-3. Group calendar In the Wild | 101 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Jason Davies swapped out the backend of the Ely Service website with CouchDB, without changing anything visible to the user. The technical details are covered on his blog. See Figure 10-4. Figure 10-4. Ely Service Jason also converted his mom’s ecommerce website, Bet Ha Bracha, to a CouchApp. It uses the _update handler to hook into different transaction gateways. See Figure 10-5. Processing JS is a toolkit for building animated art that runs in the browser. Processing JS Studio is a gallery for Processing JS sketches. See Figure 10-6. 102 | Chapter 10: Standalone Applications WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 10-5. Bet Ha Bracha Figure 10-6. Processing JS Studio In the Wild | 103 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Swinger is a CouchApp for building and sharing presentations. It uses the Sammy JavaScript application framework. See Figure 10-7. Figure 10-7. Swinger Nymphormation is a link sharing and tagging site by Benoît Chesneau. It uses CouchDB’s cookie authentication and also makes it possible to share links using replication. See Figure 10-8. Boom Amazing is a CouchApp by Alexander Lang that allows you to zoom, rotate, and pan around an SVG file, record the different positions, and then replay those for a presentation or something else (from the Boom Amazing README). See Figure 10-9. 104 | Chapter 10: Standalone Applications WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 10-8. Nymphormation Figure 10-9. Boom Amazing In the Wild | 105 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com The CouchDB Twitter Client was one of the first standalone CouchApps to be released. It’s documented in J. Chris’s blog post, “My Couch or Yours, Shareable Apps are the Future”. The screenshot in Figure 10-10 shows the word cloud generated from a MapReduce view of CouchDB’s archived tweets. The cloud is normalized against the global view, so universally common words don’t dominate the chart. Figure 10-10. Twitter Client Toast is a chat application that allows users to create channels and then invite others to real-time chat. It was initially a demo of the _changes event loop, but it started to take off as a way to chat. See Figure 10-11. Sofa is the example application for this part, and it has been deployed by a few different authors around the web. The screenshot in Figure 10-12 is from Jan’s Tumblelog. To see Sofa in action, visit J. Chris’s site, which has been running Sofa since late 2008. 106 | Chapter 10: Standalone Applications WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 10-11. Toast Figure 10-12. Sofa In the Wild | 107 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Wrapping Up J. Chris decided to port his blog from Ruby on Rails to CouchDB. He started by exporting Rails ActiveRecord objects as JSON documents, paring away some features, and adding others as he converted to HTML and JavaScript. The resulting blog engine features access-controlled posting, open comments with the possibility of moderation, Atom feeds, Markdown formatting, and a few other little goodies. This book is not about jQuery, so although we use this JavaScript library, we’ll refrain from dwelling on it. Readers familiar with using asynchronous XMLHttpRequest (XHR) should feel right at home with the code. Keep in mind that the figures and code samples in this part omit many of the bookkeeping details. We will be studying this application and learning how it exercises all the core features of CouchDB. The skills learned in this part should be broadly applicable to any CouchDB application domain, whether you intend to build a self-hosted CouchApp or not. 108 | Chapter 10: Standalone Applications WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 11 Managing Design Documents Applications can live in CouchDB—nice. You just attach a bunch of HTML and JavaScript files to a design document and you are good to go. Spice that up with viewpowered queries and show functions that render any media type from your JSON documents, and you have all it takes to write self-contained CouchDB applications. Working with the Example Application If you want to install and hack on your own version of Sofa while you read the following chapters, we’ll be using CouchApp to upload the source code as we explore it. We’re particularly excited by the prospect of deploying applications to CouchDB because, depending on a least-common denominator environment, that encourages users to control not just the data but also the source code, which will let more people build personal web apps. And when the web app you’ve hacked together in your spare time hits the big time, the ability of CouchDB to scale to larger infrastructure sure doesn’t hurt. In a CouchDB design document, there are a mix of development languages (HTML, JS, CSS) that go into different places like attachments and design document attributes. Ideally, you want your development environment to help you as much as possible. More important, you’re already used to proper syntax highlighting, validation, integrated documentation, macros, helpers, and whatnot. Editing HTML and JavaScript code as the string attributes of a JSON object is not exactly modern computing. Lucky for you, we’ve been working on a solution. Enter CouchApp. CouchApp lets you develop CouchDB applications in a convenient directory hierarchy—views and shows are separate, neatly organized .js files; your static assets (CSS, images) have their place; and with the simplicity of a couchapp push, you save your app to a design document in CouchDB. Make a change? couchapp push and off you go. 109 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com This chapter guides you through the installation and moving parts of CouchApp. You will learn what other neat helpers it has in store to make your life easier. Once we have CouchApp, we’ll use it to install and deploy Sofa to a CouchDB database. Installing CouchApp The CouchApp Python script and JavaScript framework we’ll be using grew out of the work designing this example application. It’s now in use for a variety of applications, and has a mailing list, wiki, and a community of hackers. Just search the Internet for “couchapp” to find the latest information. Many thanks to Benoît Chesneau for building and maintaining the library (and contributing to CouchDB’s Erlang codebase and many of the Python libraries). CouchApp is easiest to install using the Python easy_install script, which is part of the setuptools package. If you are on a Mac, easy_install should already be available. If easy_install is not installed and you are on a Debian variant, such as Ubuntu, you can use the following command to install it: sudo apt-get install python-setuptools Once you have easy_install, installing CouchApp should be as easy as: sudo easy_install -U couchapp Hopefully, this works and you are ready to start using CouchApp. If not, read on…. The most common problem people have installing CouchApp is with old versions of dependencies, especially easy_install itself. If you got an installation error, the best next step is to attempt to upgrade setuptools and then upgrade CouchApp by running the following commands: sudo easy_install -U setuptools sudo easy_install -U couchapp If you have other problems installing CouchApp, have a look at setuptools for Python’s easy install troubleshooting, or visit the CouchApp mailing list. Using CouchApp Installing CouchApp via easy_install should, as they say, be easy. Assuming all goes according to plan, it takes care of any dependencies and puts the couchapp utility into your system’s PATH so you can immediately begin by running the help command: couchapp --help We’ll be using the clone and push commands. clone pulls an application from a running instance in the cloud, saving it as a directory structure on your filesystem. push deploys a standalone CouchDB application from your filesystem to any CouchDB over which you have administrative control. 110 | Chapter 11: Managing Design Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Download the Sofa Source Code There are three ways to get the Sofa source code. They are all equally valid; it’s just a matter of personal preference and how you plan to use the code once you have it. The easiest way is to use CouchApp to clone it from a running instance. If you didn’t install CouchApp in the previous section, you can read the source code (but not install and run it) by downloading and extracting the ZIP or TAR file. If you are interested in hacking on Sofa and would like to join the development community, the best way to get the source code is from the official Git repository. We’ll cover these three methods in turn. First, enjoy Figure 11-1. Figure 11-1. A happy bird to ease any install-induced frustration CouchApp Clone Download at WoweBook.com One of the easiest ways to get the Sofa source code is by cloning directly from J. Chris’s blog using CouchApp’s clone command to download Sofa’s design document to a collection of files on your local hard drive. The clone command operates on a design document URL, which can be hosted in any CouchDB database accessible via HTTP. To clone Sofa from the version running on J. Chris’s blog, run the following command: couchapp clone http://jchrisa.net/drl/_design/sofa You should see this output: [INFO] Cloning sofa to ./sofa Now that you’ve got Sofa on your local filesystem, you can skip to “Deploying Sofa” on page 115 to make a small local change and push it to your own CouchDB. ZIP and TAR Files If you merely want to peruse the source code while reading along with this book, it is available as standard ZIP or TAR downloads. To get the ZIP version, access the folDownload the Sofa Source Code | 111 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com lowing URL from your browser, which will redirect to the latest ZIP file of Sofa: http: //github.com/couchapp/couchapp/zipball/master. If you prefer, a TAR file is available as well: http://github.com/couchapp/couchapp/tarball/master. Join the Sofa Development Community on GitHub The most up-to-date version of Sofa will always be available at its public code reposi tory. If you are interested in staying up-to-date with development efforts and contributing patches back to the source, the best way to do it is via Git and GitHub. Git is a form of distributed version control that allows groups of developers to track and share changes to software. If you are familiar with Git, you’ll have no trouble using it to work on Sofa. If you’ve never used Git before, it has a bit of a learning curve, so depending on your tolerance for new software, you might want to save learning Git for another day—or you might want to dive in head first! For more information about Git and how to install it, see the official Git home page. For other hints and help using Git, see the GitHub guides. To get Sofa (including all development history) using Git, run the following command: git clone git://github.com/jchris/sofa.git Now that you’ve got the source, let’s take a quick tour. The Sofa Source Tree Once you’ve succeeded with any of these methods, you’ll have a copy of Sofa on your local disk. The following text is generated by running the tree command on the Sofa directory to reveal the full set of files it contains. Sections of the text are annotated to make it clear how various files and directories correspond to the Sofa design document. sofa/ |-- README.md |-- THANKS.txt The source tree contains some files that aren’t necessary for the application—the README and THANKS files are among those. |-| | | | | | | | | | _attachments |-- LICENSE.txt |-- account.html |-- blog.js |-- jquery.scrollTo.js |-- md5.js |-- screen.css |-- showdown-licenese.txt |-- showdown.js |-- tests.js `-- textile.js 112 | Chapter 11: Managing Design Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com The _attachments directory contains files that are saved to the Sofa design document as binary attachments. CouchDB serves attachments directly (instead of including them in a JSON wrapper), so this is where we store JavaScript, CSS, and HTML files that the browser will access directly. Making your first edit to the Sofa source code will show you how easy it is to modify the application. |-- blog.json The blog.json file contains JSON used to configure individual installations of Sofa. Currently, it sets one value, the title of the blog. You should open this file now and personalize the title field—you probably don’t want to name your blog “Daytime Running Lights,” so now’s your chance to come up with something more fun! You could add other blog configurations to this file—maybe things like how many posts to show per page and a URL for an About page for the author. Working changes like these into the application will be easy once you’ve walked through later chapters. |-- couchapp.json We’ll see later that couchapp outputs a link to Sofa’s home page when couchapp push is run. The way this works is pretty simple: CouchApp looks for a JSON field on the design document at the address design_doc.couchapp.index. If it finds it, it appends the value to the location of the design document itself to build the URL. If there is no CouchApp index specified, but the design document has an attachment called index.html, then it is considered the index page. In Sofa’s case, we use the index value to point to a list of the most recent posts. |-- helpers | `-- md5.js The helpers directory here is just an arbitrary choice—CouchApp will push any files and folders to the design document. In this case, the source code to md5.js is JSONencoded and stored on the design_document.helpers.md5 element. |-- lists | `-- index.js The lists directory contains a JavaScript function that will be executed by CouchDB to render view rows as Sofa’s HTML and Atom indexes. You could add new list functions by creating new files within this directory. Lists are covered in depth in Chapter 14. |-- shows | |-- edit.js | `-- post.js Download the Sofa Source Code | 113 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com The shows directory holds the functions CouchDB uses to generate HTML views of blog posts. There are two views: one for reading posts and the other for editing. We’ll look at these functions in the next few chapters. |-| | | | | | templates |-- edit.html |-- index | |-- head.html | |-- row.html | `-- tail.html `-- post.html The templates directory is like the helpers directory and unlike the lists, shows, or views directories in that the code stored is not directly executed on CouchDB’s server side. Instead, the templates are included into the body of the list and show functions using macros run by CouchApp when pushing code to the server. These CouchApp macros are covered in Chapter 12. The key point is that the templates name could be anything. It is not a special member of the design document; just a convenient place to store and edit our template files. |-- validate_doc_update.js This file corresponds to the JavaScript validation function used by Sofa to ensure that only the blog owner can create new posts, as well as to ensure that the comments are well formed. Sofa’s validation function is covered in detail in Chapter 12. |-- vendor | `-- couchapp | |-- README.md | |-- _attachments | | `-- jquery.couchapp.js | |-- couchapp.js | |-- date.js | |-- path.js | `-- template.js The vendor directory holds code that is managed independently of the Sofa application itself. In Sofa’s case, the only vendor package used is couchapp, which contains JavaScript code that knows how to do things like link between list and show URLs and render templates. During couchapp push, files within a vendor/**/_attachments/* path are pushed as design document attachments. In this case, jquery.couchapp.js will be pushed to an attachment called couchapp/jquery.couchapp.js (so that multiple vendor packages can have the same attachment names without worry of collisions). `-- views |-- comments | |-- map.js | `-- reduce.js |-- recent-posts | `-- map.js `-- tags 114 | Chapter 11: Managing Design Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com |-- map.js `-- reduce.js The views directory holds MapReduce view definitions, with each view represented as a directory, holding files corresponding to map and reduce functions. Deploying Sofa The source code is safely on your hard drive, and you’ve even been able to make minor edits to the blog.json file. Now it’s time to deploy the blog to a local CouchDB. The push command is simple and should work the first time, but two other steps are involved in setting up an admin account on your CouchDB and for your CouchApp deployments. By the end of this chapter you’ll have your own running copy of Sofa. Pushing Sofa to Your CouchDB Any time you make edits to the on-disk version of Sofa and want to see them in your browser, run the following command: couchapp push . sofa This deploys the Sofa source code into CouchDB. You should see output like this: [INFO] Pushing CouchApp in /Users/jchris/sofa to design doc: http://127.0.0.1:5984/sofa/_design/sofa [INFO] Visit your CouchApp here: http://127.0.0.1:5984/sofa/_design/sofa/_list/index/recent-posts?descending= true&limit=5 If you get an error, make sure your target CouchDB instance is running by making a simple HTTP request to it: curl http://127.0.0.1:5984 The response should look like: {"couchdb":"Welcome","version":"0.10.1"} If CouchDB is not running yet, go back to Chapter 3 and follow the “Hello World” instructions there. Visit the Application If CouchDB was running, then couchapp push should have directed you to visit the application’s index URL. Visiting the URL should show you something like Figure 11-2. Deploying Sofa | 115 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 11-2. Empty index page We’re not done yet—there are a couple of steps remaining before you’ve got a fully functional Sofa instance. Set Up Your Admin Account Sofa is a single-user application. You, the author, are the administrator and the only one who can add and edit posts. To make sure no one else goes in and messes with your writing, you must create an administrator account in CouchDB. This is a straightforward task. Find your local.ini file and open it in your text editor. (By default, it’s stored at /usr/local/etc/couchdb/local.ini.) If you haven’t already, uncomment the [admins] section at the end of the file. Next, add a line right below the [admins] section with your preferred username and password: [admins] jchris = secretpass Now that you’ve edited your local.ini configuration file, you need to restart CouchDB for changes to take effect. Depending on how you started CouchDB, there are different methods of restarting it. If you started in a console, then hitting Ctrl-C and rerunning the same command you used to start it is the simplest way. If you don’t like your passwords lying around in plain-text files, don’t worry. When CouchDB starts up and reads this file, it takes your password and changes it to a secure hash, like this: [admins] jchris = -hashed-207b1b4f8434dc604206c2c0c2aa3aae61568d6c,964 \ 06178007181395cb72cb4e8f2e66e CouchDB will now ask you for your credentials when you try to create databases or change documents—exactly the things you want to keep to yourself. 116 | Chapter 11: Managing Design Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Deploying to a Secure CouchDB Now that we’ve set up admin credentials, we’ll need to supply them on the command line when running couchapp push. Let’s try it: couchapp push . http://jchris:secretpass@localhost:5984/sofa Make sure to replace jchris and secretpass with your actual values or you will get a “permission denied” error. If all works according to plan, everything will be set up in CouchDB and you should be able to start using your blog. At this point, we are technically ready to move on, but you’ll be much happier if you make use of the .couchapprc file as documented in the next section. Configuring CouchApp with .couchapprc If you don’t want to have to put the full URL (potentially including authentication parameters) of your database onto the command line each time you push, you can use the .couchapprc file to store deployment settings. The contents of this file are not pushed along with the rest of the app, so it can be a safe place to keep credentials for uploading your app to secure servers. The .couchapprc file lives in the source directory of your application, so you should look to see if it is at /path/to/the/directory/of/sofa/.couchapprc (or create it there if it is missing). Dot files (files with names that start with a period) are left out of most directory listings. Use whatever tricks your OS has to “show hidden files.” The simplest one in a standard command shell is to list the directory using ls -a, which will show all hidden files as well as normal files. { } "env": { "default": { "db": "http://jchris:secretpass@localhost:5984/sofa" }, "staging": { "db": "http://jchris:secretpass@jchrisa.net:5984/sofa-staging" }, "drl": { "db": "http://jchris:secretpass@jchrisa.net/drl" } } With this file set up, you can push your CouchApp with the command couchapp push, which will push the application to the “default” database. CouchApp also supports alternate environments. To push your application to a development database, you could use couchapp push dev. In our experience, taking the time to set up a good .couchapprc is always worth it. Another benefit is that it keeps your passwords off the screen when you are working. Configuring CouchApp with .couchapprc | 117 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com CHAPTER 12 Storing Documents Documents are CouchDB’s central data structure. To best understand and use CouchDB, you need to think in documents. This chapter walks you though the lifecycle of designing and saving a document. We’ll follow up by reading documents and aggregating and querying them with views. In the next section, you’ll see how CouchDB can also transform documents into other formats. Documents are self-contained units of data. You might have heard the term record to describe something similar. Your data is usually made up of small native types such as integers and strings. Documents are the first level of abstraction over these native types. They provide some structure and logically group the primitive data. The height of a person might be encoded as an integer (176), but this integer is usually part of a larger structure that contains a label ("height": 176) and related data ({"name":"Chris", "height": 176}). How many data items you put into your documents depends on your application and a bit on how you want to use views (later), but generally, a document roughly corresponds to an object instance in your programming language. Are you running an online shop? You will have items and sales and comments for your items. They all make good candidates for objects and, subsequently, documents. Documents differ subtly from garden-variety objects in that they usually have authors and CRUD operations (create, read, update, delete). Document-based software (like the word processors and spreadsheets of yore) builds its storage model around saving documents so that authors get back what they created. Similarly, in a CouchDB application you may find yourself giving greater leeway to the presentation layer. If, instead of adding timestamps to your data in a controller, you allow the user to control them, you get draft status and the ability to publish articles in the future for free (by viewing published documents using an endkey of now). Validation functions are available so that you don’t have to worry about bad data causing errors in your system. Often in document-based software, the client application edits and manipulates the data, saving it back. As long as you give the user the document she asked you to save, she’ll be happy. 119 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Say your users can comment on the item (“lovely book”); you have the option to store the comments as an array, on the item document. This makes it trivial to find the item’s comments, but, as they say, “it doesn’t scale.” A popular item could have tens of comments, or even hundreds or more. Instead of storing a list on the item document, in this case it may be better to model comments into a collection of documents. There are patterns for accessing collections, which CouchDB makes easy. You likely want to show only 10 or 20 at a time and provide previous and next links. By handling comments as individual entities, you can group them with views. A group could be the entire collection or slices of 10 or 20, sorted by the item they apply to so that it’s easy to grab the set you need. A rule of thumb: break up into documents everything that you will be handling separately in your application. Items are single, and comments are single, but you don’t need to break them into smaller pieces. Views are a convenient way to group your documents in meaningful ways. Let’s go through building our example application to show you in practice how to work with documents. JSON Document Format The first step in designing any application (once you know what the program is for and have the user interaction nailed down) is deciding on the format it will use to represent and store data. Our example blog is written in JavaScript. A few lines back we said documents roughly represent your data objects. In this case, there is a an exact correspondence. CouchDB borrowed the JSON data format from JavaScript; this allows us to use documents directly as native objects when programming. This is really convenient and leads to fewer problems down the road (if you ever worked with an ORM system, you might know what we are hinting at). Let’s draft a JSON format for blog posts. We know we’ll need each post to have an author, a title, and a body. We know we’d like to use document IDs to find documents so that URLs are search engine–friendly, and we’d also like to list them by creation date. It should be pretty straightforward to see how JSON works. Curly braces ({}) wrap objects, and objects are key/value lists. Keys are strings that are wrapped in double quotes (""). Finally, a value is a string, an integer, an object, or an array ([]). Keys and values are separated by a colon (:), and multiple keys and values by comma (,). That’s it. For a complete description of the JSON format, see Appendix E. Figure 12-1 shows a document that meets our requirements. The cool thing is we just made it up on the spot. We didn’t go and define a schema, and we didn’t define how things should look. We just created a document with whatever we needed. Now, requirements for objects change all the time during the development of an application. Coming up with a different document that meets new, evolved needs is just as easy. 120 | Chapter 12: Storing Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Figure 12-1. The JSON post format Do I really look like a guy with a plan? You know what I am? I’m a dog chasing cars. I wouldn’t know what to do with one if I caught it. You know, I just do things. The mob has plans, the cops have plans, Gordon’s got plans. You know, they’re schemers. Schemers trying to control their little worlds. I’m not a schemer. I try to show the schemers how pathetic their attempts to control things really are. —The Joker, The Dark Knight Let’s examine the document in a little more detail. The first two members (_id and _rev) are for CouchDB’s housekeeping and act as identification for a particular instance of a document. _id is easy: if I store something in CouchDB, it creates the _id and returns it to me. I can use the _id to build the URL where I can get my something back. JSON Document Format | 121 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com Your document’s _id defines the URL the document can be found under. Say you have a database movies. All documents can be found somewhere under the URL /movies, but where exactly? If you store a document with the _id Jabberwocky ({"_id":"Jabber wocky"}) into your movies database, it will be available under the URL /movies/Jabberwocky. So if you send a GET request to /movies/ Jabberwocky, you will get back the JSON that makes up your document ({"_id":"Jabberwocky"}). The _rev (or revision ID) describes a version of a document. Each change creates a new document version (that again is self-contained) and updates the _rev. This becomes useful because, when saving a document, you must provide an up-to-date _rev so that CouchDB knows you’ve been working against the latest document version. We touched on this in Chapter 2. The revision ID acts as a gatekeeper for writes to a document in CouchDB’s MVCC system. A document is a shared resource; many clients can read and write them at the same time. To make sure two writing clients don’t step on each other’s feet, each client must provide what it believes is the latest revision ID of a document along with the proposed changes. If the on-disk revision ID matches the provided _rev, CouchDB will accept the change. If it doesn’t, the update will be rejected. The client should read the latest version, integrate the changes, and try saving again. This mechanism ensures two things: a client can only overwrite a version it knows, and it can’t trip over changes made by other clients. This works without CouchDB having to manage explicit locks on any document. This ensures that no client has to wait for another client to complete any work. Updates are serialized, so CouchDB will never attempt to write documents faster than your disk can spin, and it also means that two mutually conflicting writes can’t be written at the same time. Beyond _id and _rev: Your Document Data Now that you thoroughly understand the role of _id and _rev on a document, let’s look at everything else we’re storing. { "_id":"Hello-Sofa", "_rev":"2-2143609722", "type":"post", The first thing is the type of the document. Note that this is an application-level parameter, not anything particular to CouchDB. The type is just an arbitrarily named key/value pair as far as CouchDB is concerned. For us, as we’re adding blog posts to Sofa, it has a little deeper meaning. Sofa uses the type field to determine which validations to apply. It can then rely on documents of that type being valid in the views and the user interface. This removes the need to check for every field and nested JSON value before using it. This is purely by convention, and you can make up your own or infer 122 | Chapter 12: Storing Documents WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com the type of a document by its structure (“has an array with three elements”—a.k.a. duck typing). We just thought this was easy to follow and hope you agree. "author":"jchris", "title":"Hello Sofa", The author and title fields are set when the post is created. The title field can be changed, but the author field is locked by the validation function for security. Only the author may edit the post. "tags":["example","blog post","json"], Sofa’s tag system just stores them as an array on the document. This kind of denormalization is a particularly good fit for CouchDB. "format":"markdown", "body":"some markdown text", "html":"the html text
", Blog posts are composed in the Markdown HTML format to make them easy to author. The Markdown format as typed by the user is stored in the body field. Before the blog post is saved, Sofa converts it to HTML in the client’s browser. There is an interface for previewing the Markdown conversion, so you can be sure it will display as you like. } "created_at":"2009/05/25 06:10:40 +0000" The created_at field is used to order blog posts in the Atom feed and on the HTML index page. The Edit Page The first page we need to build in order to get one of these blog entries into our post is the interface for creating and editing posts. Editing is more complex than just rendering posts for visitors to read, but that means once you’ve read this chapter, you’ll have seen most of the techniques we touch on in the other chapters. The first thing to look at is the show function used to render the HTML page. If you haven’t already, read Chapter 8 to learn about the details of the API. We’ll just look at this code in the context of Sofa, so you can see how it all fits together. function(doc, req) { // !json templates.edit // !json blog // !code vendor/couchapp/path.js // !code vendor/couchapp/template.js Sofa’s edit page show function is very straightforward. In the previous section, we showed the important templates and libraries we’ll use. The important line is the !json macro, which loads the edit.html template from the templates directory. These The Edit Page | 123 WWW.EBOOK777.COM www.it-ebooks.info free ebooks ==> www.ebook777.com macros are run by CouchApp, as Sofa is being deployed to CouchDB. For more information about the macros, see Chapter 13. } // we only show html return template(templates.edit, { doc : doc, docid : toJSON((doc && doc._id) || null), blog : blog, assets : assetPath(), index : listPath('index','recent-posts',{descending:true,limit:8}) }); The rest of the function is simple. We’re just rendering the HTML template with data culled from the document. In the case where the document does not yet exist, we make sure to set the docid to null. This allows us to use the same template both for creating new blog posts as well as editing existing ones. The HTML Scaffold The only missing piece of this puzzle is the HTML that it takes to save a document like this. In your browser, visit http://127.0.0.1:5984/blog/_design/sofa/_show/edit and, using your text editor, open the source file templates/edit.html (or view source in your browser). Everything is ready to go; all we have to do is wire up CouchDB using in-page JavaScript. See Figure 12-2. Just like any web application, the important part of the HTML is the form for accepting edits. The edit form captures a few basic data items: the post title, the body (in Markdown format), and any tags the author would like to apply.
Edit this post
<%= blogName %>
Again, we’re seeing template tags used to replace content. In this case, we link to the edit page for this post, as well as to the index page of the blog.<%= title %>
<%= date %>
The post title is used for the tag, and the date is rendered in a special tag with a
class of date. See “Dynamic Dates” on page 134 for an explanation of why we output
static dates in the HTML instead of rendering a user-friendly string like “3 days ago”
to describe the date.
Rendering Documents with Show Functions | 133
WWW.EBOOK777.COM
www.it-ebooks.info
free ebooks ==> www.ebook777.com
tag, and the date is rendered in a special tag with a
class of date. See “Dynamic Dates” on page 134 for an explanation of why we output
static dates in the HTML instead of rendering a user-friendly string like “3 days ago”
to describe the date.
Rendering Documents with Show Functions | 133
WWW.EBOOK777.COM
www.it-ebooks.info
free ebooks ==> www.ebook777.com
<%= post %>
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.4 Linearized : No Author : www.it-ebooks.info Subject : www.ebook777.com Keywords : www.it-ebooks.info Create Date : 2010:01:15 14:41:52-05:00 Modify Date : 2016:07:07 23:45:49-08:00 Has XFA : No XMP Toolkit : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-20:48:00 Creator Tool : XSL Formatter V4.3 R1 (4,3,2008,0424) for Linux Metadata Date : 2010:01:28 12:55:25+07:00 Format : application/pdf Creator : J. Chris Anderson Title : CouchDB: The Definitive Guide Producer : Antenna House PDF Output Library 2.6.0 (Linux) Trapped : False Document ID : uuid:87be3860-5a7b-44e4-a0aa-d3d8cfcd2b46 Instance ID : uuid:dbb0a703-1ab6-48f9-96c0-dd18fa31086a Page Layout : SinglePage Page Mode : UseOutlines Page Count : 272EXIF Metadata provided by EXIF.tools