Mongodb Architecture Guide
MongoDB_Architecture_Guide
User Manual:
Open the PDF directly: View PDF .
Page Count: 22
Download | |
Open PDF In Browser | View PDF |
A MongoDB White Paper MongoDB Architecture Guide MongoDB 3.6 & 4.0 Preview May 2018 Table of Contents Introduction The Best Way to Work with Data: The Document Model Easy: A Natural, Intuitive Data Model Flexibile: Dynamically Adapting to Changes Fast: Great Performance Versatile: Various Data Models and Access Patterns MongoDB Stitch Put Data Where you Need It: Intelligent Distributed Systems Architecture Relational Database Challenges MongoDB Distributed Systems Architecture Data Security Freedom to Run Anywhere MongoDB Atlas MongoDB Ops Manager Cloud Adoption Stages Conclusion and Next Steps We Can Help Resources Introduction The success of every business rests on its ability to use technology, and in particular software and data, to create a competitive advantage. Companies want to quickly develop new digital products and services to drive expansion of revenue streams, improve customer experience by engaging them in more meaningful ways, and identify opportunities to reduce the risk and cost of doing business. Organizations turn to various strategies to enable these technology transformations: • Aligning behind new IT models and processes, such as Agile and DevOps methodologies. • Adopting new architectures and platforms by taking a mobile-first approach, moving to microservices patterns, and shifting underlying infrastructure to the cloud. • Exploiting emerging technologies including AI and machine learning, IoT, and blockchain. Despite these new strategies for tackling IT initiatives, transformation continues to be complex and slow. Research from a 2017 survey by Harvey Nash and KPMG1 revealed that 88% of CIOs believe they have yet to benefit from their digital strategy. Why is this the case? Data is at the heart of every application, and from our experience in working with organizations ranging from startups to many Fortune 100 companies, realizing its full potential is still a significant challenge: • Demands for higher developer productivity and faster time to market with release cycles compressed to days and weeks are being held back by traditional rigid relational data models and waterfall development. • The inability to manage massive increases in new, rapidly changing data types – structured, semi-structured, and polymorphic data generated by new classes of web, mobile, social, and IoT applications. • Difficulty in exploiting the wholesale shift to distributed systems and cloud computing that enable developers to access on-demand, highly scalable compute and storage infrastructure, while meeting a whole new set of regulatory demands for data sovereignty. 1. https://home.kpmg.com/xx/en/home/insights/2017/05/harvey-nash-kpmg-cio-survey-2017.html 1 MongoDB responded to these challenges by creating a technology foundation that enables development teams through: 1. The document data model – presenting them the best way to work with dat data a. 2. A distributed systems design – allowing them to intelligently put dat data a wher where e they want it it. 3. A unified experience that gives them the fr freedom eedom to run anywher anywhere e – allowing them to future-proof their work and eliminate vendor lock-in. With these capabilities, we allow you to build an Intelligent Operational Data Platform, underpinned by MongoDB. In this Guide, we dive deeper into each of the three technology foundations above. The Best Way to Work with Data: The Document Model Relational databases have a long-standing position in most organizations. This made them the default way to think about storing, using, and enriching data. But enterprises are increasingly encountering limitations of this technology. Modern applications present new challenges that stretch the limits of what’s possible with a relational database. As organizations seek to build these modern applications, they find that the key differentiator for success is their development teams. Developers are on the front lines of digital transformation, and enabling them to work faster produces compounding benefits for the organization. To realize the full potential of data and software, developers turn to technologies that enable rather than hinder them. Through strategies such as Agile and DevOps, microservices, cloud replatforming and more, many organizations have made significant progress in refactoring and evolving application tier code to respond faster to changing business requirements. But they then find themselves hampered by the rigidity and complexity of relational databases. Organizations need a fresh way to work with data. In order to handle the complex data of modern applications and simultaneously increase development velocity, the key is a platform that is: • Easy Easy, letting them work with data in a natural, intuitive way • Flexible Flexible, so that they can adapt and make changes quickly • Fast ast, delivering great performance with less code • Versatile ersatile, supporting a wide variety of data models, relationships, and queries MongoDB’s document model delivers these benefits for developers, making it the best way to work with data. Easy: A Natural, Intuitive Data Model Relational databases use a tabular data model, storing data across many tables. An application of any complexity easily requires hundreds or even thousands of tables. This sprawl occurs because of the way the tabular model treats data. The conceptual model of application code typically resembles the real world. That is, objects in application code, including their associated data, usually correspond to real-world entities: customers or users, products, and so on. Relational databases, however, require a different structure. Because of the need to normalize data, a logical entity is typically represented in many separate parent-child tables linked by foreign keys. This data model doesn’t resemble the entity in the real world, or how that entity is expressed as an object in application code. This difference makes it difficult for developers to reason about the underlying data model while writing code, slowing down application development; this is sometimes referred to as object-relational impedance mismatch. One workaround for this is to employ an object-relational mapping layer (ORM). But this creates its own challenges, including managing the middleware and revising the mapping whenever either the application code or the database schema changes. In contrast to this tabular model, MongoDB uses a document data model. Documents are a much more natural way to describe data. They present a single data structure, with related data embedded as sub-documents and arrays. This allows documents to be closely aligned to 2 Figur Figure e 1: Modeling a customer with the relational database: data is split across multiple tables the structure of objects in the programming language. As a result, it’s simpler and faster for developers to model how data in the application will map to data stored in the database. It also significantly reduces the barrier-to-entry for new developers who begin working on a project – for example, adding new microservices to an existing app. This JSON document demonstrates how a customer object is modeled in a single, rich document structure with nested arrays and sub-documents. { "_id": ObjectId("5ad88534e3632e1a35a58d00"), "name": { "first": "John", "last": "Doe" }, "address": [ { "location": "work", "address": { "street": "16 Hatfields", "city": "London", "postal_code": "SE1 8DJ"}, "geo": { "type": "Point", "coord": [ 51.5065752,-0.109081]}}, + {...} ], "phone": [ { "location": "work", "number": "+44-1234567890"}, + {...} ], "dob": ISODate("1977-04-01T05:00:00Z"), "retirement_fund": NumberDecimal("1292815.75") } MongoDB stores data as JSON (JavaScript Object Notation) documents in a binary representation called BSON (Binary JSON). Unlike other databases that store JSON data as simple strings and numbers, the BSON encoding extends the JSON representation to include 3 additional types such as int, long, date, floating point, and decimal128 – the latter is especially important for high precision, lossless financial and scientific calculations. This makes it much easier for developers to process, sort, and compare data. BSON documents contain one or more fields, and each field contains a value of a specific data type, including arrays, binary data, and sub-documents. MongoDB provides native drivers for all popular programming languages and frameworks to make development easy and natural. Supported drivers include Java, Javascript, C#/.NET, Python, Perl, PHP, Scala and others, in addition to 30+ community-developed drivers. MongoDB drivers are designed to be idiomatic for the given programming language. MongoDB Compass, the GUI for MongoDB, makes it easy to explore and manipulate your data. Visualize the structure of data in MongoDB, run ad hoc queries and evaluate their performance, view and create indexes, build data validation rules, and more. Compass provides an intuitive interface for working with MongoDB. Flexibile: Dynamically Adapting to Changes The tabular data model is rigid. It was built for structured data, where each record in a table has identical columns. While it’s possible to handle polymorphism and semi-structured or unstructured data, it's clumsy, and working around the basic data limitations of the tabular model takes up development time. Furthermore, the tabular model demands that the schema be pre-defined, with any changes requiring schema migrations. Practically, this means that developers need to plan their data structure well in advance, and imposes friction to the development process when adding features or making application updates that require schema changes. This is a poor match for agile, iterative development models. MongoDB documents are polymorphic – fields can vary from document to document within a single collection (analogous to table in a tabular database). For example, all documents that describe customers might contain the customer ID and the last date they purchased a product or service, but only some of these documents might contain the user’s social media handle, or location data from a mobile app. There is no need to declare the structure of documents to the system – documents are self-describing. If a new field needs to be added to a document, the field can be created without affecting all other documents in the system, without updating a central system catalog, and without taking the database offline. Developers can start writing code and persist objects as they are created. And when they need to add more features, MongoDB continues to store the updated objects without the need to perform costly ALTER TABLE operations – or worse, having to redesign the schema from scratch. Even trivial changes to an existing relational data model result in a complex dependency chain – from updating ORM class-table mappings to programming language classes that have to be recompiled and code changed accordingly. Schema Governance While MongoDB’s flexible schema is a powerful feature, there are situations where strict guarantees on the schema’s data structure and content are required. Unlike NoSQL databases that push enforcement of these controls back into application code, MongoDB provides schema validation within the database via syntax derived from the proposed IETF JSON Schema standard. Using schema validation, DevOps and DBA teams can define a prescribed document structure for each collection, with the database rejecting any documents that do not conform to it. Administrators have the flexibility to tune schema validation according to use case – for example, if a document fails to comply with the defined structure, it can be either be rejected or written to the collection while logging a warning message. Structure can be imposed on just a subset of fields – for example, requiring a valid customer name and address, while other fields can be freeform. With schema validation, DBAs can apply data governance standards to their schema, while developers maintain the benefits of a flexible document model. 4 Fast: Great Performance The normalization of data in the tabular model means that accessing data for an entity, such as our customer example earlier, typically requires JOINing multiple tables together. JOINs entail a performance penalty, even when optimized – which takes time, effort, and advanced SQL skills. In MongoDB, a document is a single place for the database to read and write data for an entity. The complete document can be accessed in a single database operation that avoids the need internally to pull data from many different tables and rows. For most queries, there’s no need to JOIN multiple records. Should your application access patterns require it, MongoDB does provide the equivalent of a JOIN, the ability to $lookup2 between multiple collections. This is very useful for analytics workloads, but is generally not required for operational use cases. This also simplifies query development and optimization. There’s no need to write complex code to manipulate text and values into SQL and work with multiple tables. Figure 2 illustrates the difference between using the MongoDB query language3 and SQL4 to insert a single user record, where users have multiple properties including name, all of their addresses, phone numbers, interests, and more. 2. https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/index.html 3. https://git.io/vpnxX 4. https://git.io/vpnpG 5 Figur Figure e 2: Comparison of SQL and MongoDB code to insert a single user 6 Creating Real-Time Data Pipelines with Change Streams a range of application requirements, both in the way data is modeled and how it is queried. Further building on the “speed” theme change streams5 enable developers to build reactive and real-time apps for web, mobile, and IoT that can view, filter, and act on data changes as they occur in the database. Change streams enable fast and seamless data movement across distributed database and application estates, making it simple to stream data changes and trigger actions wherever they are needed, using a fully reactive programming style. Use cases enabled by MongoDB change streams include: The flexibility and rich data types of documents make it possible to model data in many different structures, representative of entities in the real world. The embedding of arrays and sub-documents makes documents very powerful at modeling complex relationships and hierarchical data, with the ability to manipulate deeply nested data without the need to rewrite the entire document. But documents can also do much more: they can be used to model flat, table-like structures, simple key-value pairs, text, geospatial data, the nodes and edges used in graph processing, and more. • Powering trading applications that need to be updated in real time as stock prices rise and fall. • Refreshing scoreboards in multiplayer games. • Updating dashboards, analytics systems, and search engines as operational data changes. • Creating powerful IoT data pipelines that can react whenever the state of physical objects change. • Synchronizing updates across serverless and microservice architectures by triggering an API call when a document is inserted or modified. Versatile: Various Data Models and Access Patterns Building upon the ease, flexibility, and speed of the document model, MongoDB enables developers to satisfy With an expressive query language documents can be queried in many ways (see Table 1) – from simple lookups and range queries to creating sophisticated processing pipelines for data analytics and transformations, through to faceted search, JOINs, geospatial processing, and graph traversals. This is in contrast to most distributed databases, which offer little more than simple key-value access to your data. The MongoDB query model is also implemented as methods or functions within the API of a specific programming language, as opposed to a completely separate language like SQL. This, coupled with the affinity between MongoDB’s JSON document model and the data structures used in object-oriented programming, further speeds developer productivity. For a complete list of drivers see the MongoDB Drivers documentation. 5. https://docs.mongodb.com/manual/changeStreams/index.html 7 Expr Expressive essive Queries • Find anyone with phone # “1-212…” • Check if the person with number “555…” is on the “do not call” list Geospatial • Find the best offer for the customer at geo coordinates of 42nd St. and 6th Ave Text Sear Searcch • Find all tweets that mention the firm within the last 2 days Faceted Navigation Aggr Aggregation egation Native Binary JJSON SON Support Fine-grained Array Operations JOI JOIN N ($lookup) Graph Queries ($graphL ($graphLookup) ookup) • Filter results to show only products <$50, size large, and manufactured by ExampleCo • Count and sort number of customers by city, compute min, max, and average spend • Add an additional phone number to Mark Smith’s record without rewriting the document at the client • Update just 2 phone numbers out of 10 • Sort on the modified date • In Mark Smith’s array of test scores, update every score <70 to be 0 • Query for all San Francisco residences, lookup their transactions, and sum the amount by person • Query for all people within 3 degrees of separation from Mark Table 1: MongoDB’s rich query functionality MongoDB’s versatility is further supported by its indexing capabilities. Queries can be performed quickly and efficiently with an appropriate indexing strategy. MongoDB permits secondary indexes to be declared on any field, including field within arrays. Indexes can be created and dropped at any time to easily support changing application requirements and query patterns. Index types include compound indexes, text indexes, geospatial indexes, and more. Further, indexes can be created with special properties to enforce data rules or support certain workloads – for example, to expire data according to retention policies or guarantee uniqueness of the indexed field within a collection. Table 2 summarizes the indexes available with MongoDB. Index T Types ypes Index F Featur eatures es Primary Index Index: Every Collection has a primary key index TTL Indexes Indexes: Single Field indexes, when expired delete the document Compound Index Index: Index against multiple keys in the document Unique Indexes Indexes: Ensures value is not duplicated MultiK MultiKey ey Index Index: Index into arrays Partial Indexes Indexes: Expression based indexes, allowing indexes on subsets of data Text Indexes Indexes: Support for text searches Case Insensitive Indexes Indexes: supports text search using case insensitive search GeoSpatial Indexes Indexes: 2d & 2dSphere indexes for spatial geometries Sparse Indexes Indexes: Only index documents which have the given field Hashed Indexes Indexes: Hashed based values for sharding Table 2: MongoDB offers fully-featured secondary indexes 8 Data Consistency Guarantees MongoDB’s versatility also extends to data consistency requirements. As a distributed system, MongoDB handles the complexity of maintaining multiple copies of data via replication (see the Availability section below). Read and write operations are directed to the primary replica by default for strong consistency, but users can choose to read from secondary replicas for reduced network latency, especially when users are geographically dispersed, or for isolating operational and analytical workloads running in a single cluster. When reading data from any cluster member, users can tune MongoDB’s consistency model to match application requirements, down to the level of individual queries within an app. When a situation mandates the strictest linearizable or causal consistency, MongoDB will enforce it; if an application needs to only read data that has been committed to a majority of nodes (and therefore can’t be rolled back in the event of a primary election) or even just to a single replica, MongoDB can be configured for this. By providing this level of tunability, MongoDB can satisfy the full range of consistency, performance, and geo-locality requirements of modern apps. When writing data, MongoDB similarly offers tunable configurations for durability requirements, discussed further in the Availability section. Transactional Model Because documents can bring together related data that would otherwise be modelled across separate parent-child tables in a tabular schema, MongoDB’s atomic single-document operations provide transaction semantics that meet the data integrity needs of the majority of applications. One or more fields may be written in a single operation, including updates to multiple sub-documents and elements of an array. The guarantees provided by MongoDB ensure complete isolation as a document is updated; any errors cause the operation to roll back so that clients receive a consistent view of the document. The addition of multi-document transactions, scheduled for MongoDB 4.06, makes it even easier for developers to address more use cases with MongoDB. They feel just like the transactions developers are familiar with from relational databases – multi-statement, similar syntax, and easy to add to any application. Through snapshot isolation, transactions provide a globally consistent view of data, enforce all-or-nothing execution, and will not impact performance for workloads that do not require them. Learn more and take them for a spin. MongoDB Stitch MongoDB Stitch, Serverless for data-driven applications. Stitch streamlines application development with simple, secure access to data and services from the client – getting your apps to market faster while reducing operational costs. Stitch provides full access to your MongoDB database, in addition to public cloud services – all through an intuitive SDK. Add business logic to your backend using Stitch's hosted functions. Take advantage of Stitch's HTTP service and Webhooks to integrate with your microservices and provide secure APIs. Stitch secures access to data, services, and functions through powerful, declarative rules – putting you in control. Stitch represents the next stage in the industry's migration to a more streamlined, managed infrastructure. Virtual Machines running in public clouds (notably AWS EC2) led the way, followed by hosted containers, and serverless offerings such as AWS Lambda and Google Cloud Functions. That still required backend developers to implement and manage access controls and REST APIs to provide access to microservices, public cloud services, and of course data. Frontend developers were held back by needing to work with APIs that weren't suited to rich data queries. Put Data Where you Need It: Intelligent Distributed Systems Architecture Mobile, web, IoT, and cloud apps have significantly changed user expectations. Once, applications were designed to serve a finite audience – typically internal business 6. Safe Harb Harbour our St Statement atement: The development, release, and timing of any features or functionality described for our products remains at our sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise, or legal obligation to deliver any material, code, or functionality. 9 departments – in a single head office location. Now, users demand modern app experiences that must be always-on, accessible from any device, consistently scaled with the same low-latency responsiveness wherever they are while meeting the data sovereignty requirements demanded by new data privacy regulations. To address these needs, MongoDB is built around an intelligent distributed systems architecture that enables developers to place data where their apps and users need it. MongoDB can be run within and across geographically distributed data centers and cloud regions, providing levels of availability, workload isolation, scalability, and data locality unmatched by relational databases. Before diving further into MongoDB’s distributed systems design, let's first examine the challenges of meeting modern app needs with traditional relational databases. Relational Database Challenges Relational databases are monolithic systems, designed to run on a single server, typically with shared storage. Attempting to introduce distributed system properties to relational databases results in significantly higher developer and operations complexity and cost, slowing the pace of delivering new apps, and evolving them in line with user requirements. Availability For redundancy, most relational databases support replication to mirror the database across multiple nodes, but they lack the integrated mechanisms for automatic failover and recovery between database replicas. As a result, users need to layer 3rd-party clustering frameworks and agents (sometimes called “brokers”) to monitor the database and its host platform, initiating failover in the event something goes wrong (i.e., the database crashes or the underlying server stops responding). What are the downsides of this approach?: • Failover events need to be coordinated by the clustering software across the database, replication mechanism, storage, network, clients, and hosts. As a result, it can take multiple minutes to recover service to the application, during which time, the app is unavailable to users. • Clustering frameworks are often external to the database, so developers face the complexity of integrating and managing separate pieces of technology and processes, sometimes backed by different vendors. In some cases, these clustering frameworks are independently licensed from the database itself, adding cost. • It also means additional complexity in coordinating the implementation, testing, and ongoing database maintenance across multiple teams – developers, DBAs, network administrators, and system administrators – each with their own specific areas of responsibility. Scale-Out and Data Locality Attempting to accommodate increasing data volumes and user populations with a database running on a single server means developers can rapidly hit a scalability wall, necessitating significant application redesign and custom engineering work. While it can be possible to use replication to scale read operations across replicas of the data – with potential risks to data consistency – relational databases have no native mechanisms to partition (shard) the database across a cluster of nodes when they need to scale writes. So developers are confronted with two options: 1. Manually partition the database at the application level, which adds significant development complexity, and inhibits the ability to elastically expand and contract the cluster as workloads dictate, or as the app scales beyond the original capacity predictions. 2. Integrate a separate sharding framework for the database. Like the HA frameworks discussed above, these sharding layers are developed independently from the database, so the user has the added complexity of integrating and managing multiple, distinct pieces of technology in order to provide a complete solution. Whatever approach is taken, developers will typically lose key relational capabilities that are at the heart of traditional RDBMS application logic: ACID transactions, referential integrity, JOINs, and full SQL expressivity for any operations that span shards. As a result, they will need to recreate this functionality back at the application tier. 10 MongoDB Distributed Systems Architecture As a distributed data platform, MongoDB gives developers four essential capabilities in meeting modern application needs: • Availability • Workload isolation • Scalability • Data locality Each is discussed in turn below. Availability MongoDB maintains multiple copies of data using replica sets (Figure 3). Unlike relational databases, replica sets are self-healing as failover and recovery are fully automated, so it is not necessary to manually intervene to restore a system in the event of a failure, or to add additional clustering frameworks and agents. Replica sets also provide operational flexibility by providing a way to perform systems maintenance (i.e. upgrading underlying hardware and software) using rolling replica restarts that preserve service continuity. Figur Figure e 3: Self-healing MongoDB replica sets for continuous availability A replica set consists of multiple database replicas. To maintain strong data consistency, one member assumes the role of the primary replica against which all write operations are applied (as discussed later, MongoDB automatically shards the data set across multiple nodes to scale write operations beyond a single primary node). The other members of the replica set act as secondaries, replicating all data changes from the oplog (operations log). The oplog contains an ordered set of idempotent operations that are replayed on the secondaries. If the primary replica set member suffers an outage (e.g., a power failure, hardware fault, network partition), one of the secondary members is automatically elected to primary, typically within several seconds, and the client connections automatically failover to that new primary. Any writes that could not be serviced during the election can be automatically retried by the drivers once a new primary is established, with the MongoDB server enforcing exactly-once processing semantics. Retryable writes enable MongoDB to ensure write availability, without sacrificing data consistency. 11 The replica set election process is controlled by sophisticated algorithms based on an extended implementation of the Raft consensus protocol. Not only does this allow fast failover to maximize service availability, the algorithm ensures that only the most suitable secondary members are evaluated for election to primary and reduces the risk of unnecessary failovers (also known as "false positives"). Before a secondary replica is promoted, the election algorithms evaluate a range of parameters including: • Analysis of election identifiers, timestamps, and journal persistence to identify those replica set members that have applied the most recent updates from the primary member. • Heartbeat and connectivity status with the majority of other replica set members. • User-defined priorities assigned to replica set members. For example, administrators can configure all replicas located in a remote region to be candidates for election only if the entire primary region fails. Once the election process has determined the new primary, the secondary members automatically start replicating from it. When the original primary comes back online, it will recognize its change in state and automatically assume the role of a secondary, applying all write operations that have occurred during its outage. The number of replicas in a MongoDB replica set is configurable, with a larger number of replica members providing increased data durability and protection against database downtime (e.g., in case of multiple machine and regional failures, network partitions), or to isolate operational and analytical workloads running on the same cluster. Up to 50 members can be configured per replica set, providing operational flexibility and wide data distribution across multiple geographic sites, co-locating data in close proximity to remote users. Extending flexibility, developers can configure replica sets to provide tunable, multi-node durability, and geographic awareness. For example, they can: • Ensure write operations propagate to specific members of a replica set, deployed locally and in remote regions. MongoDB’s write concern can be configured in such a way that writes are only acknowledged once specific policies have been fulfilled, such as writing to at least two replica set members in one region and at least one replica in a second region. This reduces the risk of data loss in the event of a complete data center outage. • Ensure that specific members of a replica set respond to queries – for example, based on their physical location. The nearest read preference allows the client to read from the lowest-latency members of a replica set. This is typically used to route queries to a local data center, thus reducing the effects of geographic latency, while being able to immediately fallback to the next nearest if the closest node goes down. Tags can also be used to ensure that reads are always routed to a specific node or subset of nodes. Workload Isolation Beyond using replication for redundancy and availability, replica sets also provide a foundation for combining different classes of workload on the same MongoDB cluster, each operating against its own copy of the data. With workload isolation, business analysts can run exploratory queries and generate reports, and data scientists can build machine learning models without impacting operational applications. Within a replica set, one set of nodes can be provisioned to serve operational applications, replicating data in real time to other nodes dedicated to serving analytic workloads. By using MongoDB’s native replication to move data in real time between the different node types, developers avoid lengthy and fragile ETL cycles, while analysts can improve both the speed and quality of insights and decision making by working with fresh, rather than aged and potentially stale data. With the operational and analytic workloads isolated from one another on different replica set nodes, they never contend for resources. Replica set tags allow read operations to be directed to specific nodes within the cluster, providing physical isolation between analytics and operational queries. Different indexes can even be created for the analytics nodes, allowing developers to optimize for multiple query patterns. Data is exposed through MongoDB’s rich query language, along with the Connector 12 Figur Figure e 4: Replica sets enable global data distribution Figur Figure e 5: Combining operational and analytics workloads on a single data platform for BI and Connector for Spark to support real-time analytics and data visualization. Scalability To meet the needs of apps with large data sets and high throughput requirements, MongoDB provides horizontal scale-out for databases on low-cost, commodity hardware or cloud infrastructure using a technique called sharding. Sharding automatically partitions and distributes data across multiple physical instances called shards. Each shard is backed by a replica set to provide always-on availability and workload isolation. Sharding allows developers to seamlessly scale the database as their apps grow beyond the hardware limits of a single server, and it does this without adding complexity to the application. To respond to workload demand, nodes can be added or removed from the cluster in real time, and MongoDB will 13 automatically rebalance the data accordingly, without manual intervention. Sharding is transparent to applications; whether there is one or a thousand shards, the application code for querying MongoDB remains the same. Applications issue queries to a query router that dispatches the query to the appropriate shards. For key-value queries that are based on the shard key, the query router will dispatch the query to the shard that manages the document with the requested key. When using range-based sharding, queries that specify ranges on the shard key are only dispatched to shards that contain documents with values within the range. For queries that don’t use the shard key, the query router will broadcast the query to all shards, aggregating and sorting the results as appropriate. Multiple query routers can be used within a MongoDB cluster, with the appropriate number governed by the performance and availability requirements of the application. developers, MongoDB offers a better approach. Data can be distributed according to query patterns or data placement requirements, giving developers much higher scalability across a diverse et of workloads: • Ranged Shar Sharding ding. Documents are partitioned across shards according to the shard key value. Documents with shard key values close to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range based queries, such as co-locating data for all customers in a specific region on a specific shard. • Hashed Shar Sharding ding. Documents are distributed according to an MD5 hash of the shard key value. This approach guarantees a uniform distribution of writes across shards, which is often optimal for ingesting streams of time-series and event data. • Zoned Shar Sharding ding. Provides the ability for developers to define specific rules governing data placement in a sharded cluster. Zones are discussed in more detail in the following Data Locality section of the guide. Thousands of organizations use MongoDB to build high-performance systems at scale. You can read more about them on the MongoDB scaling page. Data Locality Figur Figure e 6: Automatic sharding for horizontal scale-out Unlike relational databases, MongoDB sharding is automatic and built into the database. Developers don't face the complexity of building sharding logic into their application code, which then needs to be updated as data is migrated across shards. They don't need to integrate additional clustering software or expensive shared-disk infrastructure to manage process and data distribution, or failure recovery. By simply hashing a primary key value, many distributed databases randomly spray data across a cluster of nodes, imposing performance penalties when data is queried, or adding complexity when data needs to be localized to specific nodes. By exposing multiple sharding policies to MongoDB zoned sharding allows precise control over where data is physically stored in a cluster. This allows developers to accommodate a range of application needs – for example controlling data placement by geographic region for latency and governance requirements, or by hardware configuration and application feature to meet a specific class of service. Data placement rules can be continuously refined by modifying shard key ranges, and MongoDB will automatically migrate the data to its new zone. The most popular use cases for MongoDB zones include the following: Geographic Data Placement MongoDB gives developers the ability to create zones in multiple geographic regions. Each zone is part of the same, single cluster and can be queried globally, but data is 14 pinned to shards in specific regions based on data locality requirements. Developers simply name a shard by region, tag their documents by region in the shard key, and MongoDB does the rest. By associating data to shards based on regional policies, developers can create global, always-on, write-everywhere clusters, with each shard serving operations local to it – enabling the database to serve distributed, write-heavy workloads with low latency. This design brings the benefits of “multi-master” database, without introducing the complexity of eventual consistency or data loss caused by conflicting writes. Zoned sharding also enables developers to keep user data within specific regions to meet governance requirements for data sovereignty, such as the EU’s GDPR. To illustrate further, an application may have users in North America, Europe, and China. The developer can assign each shard to a zone representing the physical location (North America, Europe, or China) of that shard's servers, and then map all documents to the correct zone based on its region field. Any number of shards can be associated with each zone, and each zone can be scaled independently of the others – for instance, accommodating faster user growth in China than North America. Learn more by reviewing our tutorial on creating geographically distributed clusters with MongoDB zoned sharding. Class of Service Data for a specific application feature or customer can be associated with specific zones. For instance, a company offering Software-as-a-Service (SaaS) may assign users on its free usage tier to shards provisioned on lower specified hardware, while paying customers are allocated to premium infrastructure. The SaaS provider has the flexibility to scale parts of the cluster differently for free users and paying customers. For example, the free tier can be allocated just a few shards, while paying customers can be assigned to dozens of shards. Learn more by reviewing our tutorial on configuring application affinity with MongoDB zoned sharding. Building upon application features, zoned sharding also enables deployment patterns such as tiered, or multi-temperature storage. Different subsets of data often have different response time requirements, usually based on access frequency and age of the data. For example, IoT applications or social media services handling time-series data will demand that users experience the lowest latency when accessing the latest data. This data can be pinned to the highest performance hardware with fast CPUs and SSDs. Meanwhile, aged data sets that are read less frequently typically have relaxed latency SLAs, so can be moved onto slower, less expensive hardware based on conventional, high capacity spinning disks. By including a timestamp in the shard key, the MongoDB cluster balancer can migrate data based on age from the high-performance tier to the active archive tier. Figur Figure e 7: Implementing tiered storage with MongoDB zoned sharding Learn more by reviewing our tutorial on configuring tiered storage with MongoDB zoned sharding Data Security Having the freedom to put data where it’s needed enables developers to build powerful new classes of application. However, they must also be confident that their data is secure, wherever it is stored. Rather than build security controls back in the application, they should be able to rely on the database to implement the mechanisms needed to protect sensitive data and meet the needs of apps in regulated industries. MongoDB features extensive capabilities to defend, detect, and control access to data: • Authentic Authentication. ation. Simplifying access control to the database, MongoDB offers integration with external security mechanisms including LDAP, Windows Active 15 Directory, Kerberos, and x.509 certificates. In addition, IP whitelisting allows DevOps teams to configure MongoDB to only accept external connections from approved IP addresses. • Authorization Authorization. Role-Based Access Controls (RBAC) enable DevOps teams to configure granular permissions for a user or an application based on the privileges they need to do their job. These can be defined in MongoDB, or centrally within an LDAP server. Additionally, developers can define views that expose only a subset of data from an underlying collection, i.e. a view that filters or masks specific fields, such as Personally Identifiable Information (PII) from customer data or health records. Views can also be created to only expose aggregated data. • Auditing. For regulatory compliance, security administrators can use MongoDB's native audit log to track any database operations – whether DML or DDL. • Encryption. MongoDB data can be encrypted on the network, on disk and in backups. With the Encrypted storage engine , protection of data-at-rest is an integral feature within the database. By natively encrypting database files on disk, developers eliminate both the management and performance overhead of external encryption mechanisms. Only those staff who have the appropriate database authorization credentials can access the encrypted data, providing additional levels of defense. To learn more, download the MongoDB Security Reference Architecture Whitepaper. Freedom to Run Anywhere An increasing number of companies are moving to the public cloud to not only reduce the operational overhead of managing infrastructure, but also provide their teams with on-demand services that make it easier to build and run an application backend. This move from building IT to consuming IT as a service is well aligned with a parallel organizational shift happening across companies prioritizing productivity and getting to market faster — a move from specialized and often siloed groups to more cross-functional, DevOps teams that are able to make many of their own technology decisions. The result is often a far more nimble and focused organization that is able to rapidly deliver new digital products using agile methodologies and modern application architectures, such as microservices. However, relational databases that have been designed to run on a single server are architecturally misaligned with modern cloud platforms, which are built from low-cost commodity hardware and designed to scale out as more capacity is needed. For example, cloud applications with uneven usage or spikes during certain periods require built-in elasticity and scalability across the supporting technology stack. Legacy relational databases do not natively support these capabilities requiring teams to try and introduce distributed systems properties through approaches such as application-level sharding. It’s for this reason that modern, non-tabular databases delivered as a service are growing in popularity amongst organizations moving into the cloud. But many of these database services run exclusively in a single cloud platform, which increases business risk. For the past decade, companies have increasingly adopted open source technologies to reduce lock-in with proprietary vendors. Choosing to build applications on a proprietary cloud database re-introduces the risk of lock-in to cloud vendor APIs and technologies that only run in a single environment. To reduce the likelihood of cloud lock-in, teams should build their applications on distributed databases that will deliver a consistent experience across any environment. As an open source database, MongoDB can be deployed anywhere — from mainframes to a private cloud to the public cloud. The developer experience is entirely unaffected by the deployment model; similarly, teams responsible for standing up databases, maintaining them, and optimizing performance can also leverage a unified set of tools that deliver the same experience across different environments. MongoDB allows organizations to adopt cloud at their own pace by moving select workloads as needed. For example, they may run the same workload in a hybrid environment to manage sudden peaks in demand, or use the cloud to launch services in regions where they lack a physical data center presence. 16 MongoDB Atlas Compr Comprehensive ehensive monitoring and performance Similar to the way MongoDB and Stitch dramatically improve developer productivity, MongoDB offers a fully managed, on-demand and elastic service, called MongoDB Atlas , in the public cloud. Atlas enables customers to deploy, operate, and scale MongoDB databases on AWS, Azure, or GCP in just a few clicks or programmatic API calls. Atlas allows customers to adopt a more agile, on-demand approach to IT rather than underutilizing cloud as merely a hosted infrastructure platform and replicating many of the same operational, administrative, and time-to-market challenges with running on-premises. Built-in automation and proven best practices reduce the likelihood of human error and minimize operational overhead. Key features of MongoDB Atlas include: optimization. MongoDB Atlas includes an integrated set of features that simplify database monitoring and performance optimization. Developers can get deep visibility into their clusters using optimized charts tracking dozens of key metrics, and easily customize and send alerts to channels such as Slack, Datadog, and PagerDuty. MongoDB Atlas also allows customers to see what’s happening in their clusters as it happens with the Real-Time Performance Panel, and allows them to take advantage of automatically generated index suggestions via the built-in Performance Advisor to improve query performance. Finally, the built-in Data Explorer lets operations teams run queries to review document structure and database schema, view collection metadata, and inspect index usage statistics. Automation and elasticity elasticity.. MongoDB Atlas automates infrastructure provisioning, setup, and deployment so teams can get the database resources they need, when they need them. Patches and minor version upgrades are applied automatically. Database modifications — whether it’s to scale out or perform an upgrade — can be executed in a few clicks or an API call with no downtime window required. High availability and durability durability.. MongoDB Atlas automatically creates self-healing, geographically distributed clusters with a minimum of 3 nodes to ensure no single point of failure. Even better availability guarantees are possible by enabling cross-region replication to achieve multi-region fault tolerance. MongoDB Atlas also includes powerful features to enhance reliability for mission-critical production databases, such as continuous, incremental backups with point-in-time recovery and queryable snapshots, which allow customers to restore granular data sets in a fraction of the time it would take to restore an entire snapshot. Secur Secure e by default. MongoDB Atlas makes it easy for organizations to control access to their managed databases by automatically incorporating many of the security features mentioned earlier in this architecture guide. For example, a customer’s database instances are deployed with robust access controls and end-to-end encryption. Other security features include network isolation, IP whitelisting, VPC peering, always-on authentication, and much more. Live migration. MongoDB Atlas makes it easy to migrate live data from MongoDB deployments running in any other environment. Atlas will perform an initial sync between the migration destination and the source database, and use the oplog to keep the two database in sync until teams are prepared to perform the cutover process. Live migration supports importing data from replica sets, sharded clusters, and any deployment running MongoDB 2.6 or higher. Widespr idespread ead coverage on the major cloud platforms. MongoDB Atlas is available in over 50 cloud regions across Amazon Web Services, Microsoft Azure, and Google Cloud Platform. Organizations with a global user base can use MongoDB Atlas to automatically replicate data to any number of regions of their choice to deliver fast, responsive access to data wherever their users are located. Furthermore, unlike other open source database services which vary in terms of feature-support and optimizations from cloud provider to cloud provider, MongoDB Atlas delivers a consistent experience across each of the cloud platforms, ensuring developers can deploy wherever they need to, without compromising critical functionality. You can learn about MongoDB Atlas and all of the features discussed above in the documentation. And you can take Atlas for a spin at no cost on the free tier. 17 MongoDB Ops Manager For organizations that need to run the database on their own infrastructure for business or regulatory requirements, MongoDB offers SaaS or on-premises management tools available that enable customers to build their own MongoDB service for internal development teams. MongoDB Ops Manager is the simplest way to run MongoDB on premises or in a private cloud, making it easy for operations teams to deploy, monitor, backup, and scale MongoDB. The capabilities of Ops Manager are also available in the MongoDB Cloud Manager tool, delivered as SaaS in the cloud. Deployments and upgrades. Whereas MongoDB Atlas is a fully managed database as a service platform, Ops Manager provides a powerful suite of tools that enable operations teams to implement and automate MongoDB deployment and maintenance tasks in accordance with their policies and best practices. Ops Manager coordinates critical operational tasks across the servers in a MongoDB system. It communicates with the infrastructure through agents installed on each server. The servers can reside in the public cloud or a private data center. Ops Manager reliably orchestrates the tasks that administrators have traditionally performed manually – deploying a new cluster, performing upgrades, creating point-in-time backups, and many other operational activities. Ops Manager also makes it possible to dynamically resize capacity by adding shards and replica set members. Other maintenance tasks such as upgrading MongoDB, building new indexes across replica sets or resizing the oplog can be reduced from dozens or hundreds of manual steps to the click of a button, all with zero downtime. Administrators can use the Ops Manager interface directly, or invoke the Ops Manager RESTful API from existing enterprise tools. Ops Manager features such as server pooling make it easier to build a database as a service within a private cloud environment. Ops Manager will maintain a pool of globally provisioned servers that have agents already installed. When users want to create a new MongoDB deployment, they can request servers from this pool to host the MongoDB cluster. Administrators can even associate certain properties with the servers in the pool and expose server properties as selectable options when a user initiates a request for new instances. Compr Comprehensive ehensive monitoring and performance optimization The monitoring, alerting, and performance optimization capabilities of Ops Manager and Cloud Manager are similar to what’s available with MongoDB Atlas. Integration with existing monitoring tools is straightforward via the Ops Manager and Cloud Manager RESTful API, and with packaged integrations to leading Application Performance Management (APM) platforms, such as New Relic. These integrations allow MongoDB status to be consolidated and monitored alongside the rest of your application infrastructure, all from a single pane of glass. Disaster Recovery: Bac Backups kups & point-in-time rrecovery ecovery Similar to how backups are handled in MongoDB Atlas, Ops Manager and Cloud Manager backups are maintained continuously, just a few seconds behind the operational system. Because Ops Manager reads the oplog used for replication, the ongoing performance impact is minimal – similar to that of adding an additional replica to a replica set. If the MongoDB cluster experiences a failure, the most recent backup is only moments behind, minimizing exposure to data loss. Ops Manager and Cloud Manager both also offer point-in-time backup of replica sets and cluster-wide snapshots of sharded clusters. Users can restore to precisely the moment they need, quickly and safely. Automation-driven restores allows a fully configured cluster to be re-deployed directly from the database snapshots in a just few clicks. Similar to MongoDB Atlas, Ops Manager and Cloud Manager also provide the ability to query backup snapshots. Ops Manager can also be deployed to control backups to a local data center or AWS S3. If using Cloud Manager, customers receive a fully managed backup solution with a pay-as-you-go model. Dedicated MongoDB engineers monitor user backups on a 24x365 basis, alerting operations teams if problems arise. Cloud Adoption Stages By building on a database that runs the same across any environment and using an integrated set of management tooling that delivers a consistent experience across the 18 board, organizations can ensure a seamless journey from on-premises to the public cloud.: 1. The document data model – presenting the best way • Teams dipping their toe into the cloud can start with MongoDB on premises and optimize ongoing management using Ops Manager. Through integration with OpenShift and Cloud Foundry, Ops Manager can be used as a foundation for your own private cloud database service. 2. A distributed systems design – allowing them to • As their level of comfort with the public cloud increases, they can migrate a few deployments and self-manage using Cloud Manager or try the fully managed, on-demand, as-a-service approach with MongoDB Atlas. • Cloud-first organizations interested in exploiting the benefits of a multi-cloud strategy can use MongoDB Atlas to easily spin up clusters and replicate data across regions and cloud providers, all without worrying about operations or platform lock-in. Conclusion and Next Steps Every industry is being transformed by data and digital technologies. As you build or remake your company for a digital world, speed matters – measured by how fast you build apps, how fast you scale them, and how fast you can gain insights from the data they generate. These are the keys to applications that provide better customer experiences, enable deeper, data-driven insights or make new products or business models possible. With its intelligent operational data platform, MongoDB enables developers through: to work with dat data a. intelligently put dat data a wher where e they want it it. 3. A unified experience that gives them the fr freedom eedom to run anywher anywhere e – allowing them to future-proof their work and eliminative vendor lock-in. In this guide we have explored the fundamental concepts that underpin the architecture of MongoDB. Other guides on topics such as performance, operations, and security best practices can be found at mongodb.com. You can get started now with MongoDB by: 1. Spinning up a fully managed MongoDB instance on the Atlas free tier 2. Downloading MongoDB for your own environment 3. Reviewing the MongoDB manuals and tutorials on our documentation page We Can Help We are the MongoDB experts. Over 5,700 organizations rely on our commercial products. We offer software and services to make your life easier: MongoDB Enterprise Advanced is the best way to run MongoDB in your data center. It's a finely-tuned package of advanced software, support, certifications, and other services designed for the way you do business. MongoDB Atlas is a database as a service for MongoDB, letting you focus on apps instead of ops. With MongoDB Figur Figure e 8: MongoDB provides you the freedom to run anywhere 19 Atlas, you only pay for what you use with a convenient hourly billing model. With the click of a button, you can scale up and down when you need to, with no downtime, full security, and high performance. MongoDB Stitch is a backend as a service (BaaS), giving developers full access to MongoDB, declarative read/write controls, and integration with their choice of services. MongoDB Cloud Manager is a cloud-based tool that helps you manage MongoDB on your own infrastructure. With automated provisioning, fine-grained monitoring, and continuous backups, you get a full management suite that reduces operational overhead, while maintaining full control over your databases. MongoDB Consulting packages get you to production faster, help you tune performance in production, help you scale, and free you up to focus on your next release. MongoDB Training helps you become a MongoDB expert, from design to operating mission-critical systems at scale. Whether you're a developer, DBA, or architect, we can make you better at MongoDB. Resources For more information, please visit mongodb.com or contact us at sales@mongodb.com. Case Studies (mongodb.com/customers) Presentations (mongodb.com/presentations) Free Online Training (university.mongodb.com) Webinars and Events (mongodb.com/events) Documentation (docs.mongodb.com) MongoDB Enterprise Download (mongodb.com/download) MongoDB Atlas database as a service for MongoDB (mongodb.com/cloud) MongoDB Stitch backend as a service (mongodb.com/ cloud/stitch) US 866-237-8815 • INTL +1-650-440-4474 • info@mongodb.com © 2018 MongoDB, Inc. All rights reserved. 20
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.4 Linearized : No Producer : Prince 9.0 rev 4 (www.princexml.com) Page Count : 22EXIF Metadata provided by EXIF.tools