Thought Spot Disaster Recovery Guide 4.2
User Manual: Pdf
Open the PDF directly: View PDF
.
Page Count: 11
| Download | |
| Open PDF In Browser | View PDF |
Disaster Recovery Guide Version 4.2 February 2017 Table of Contents Contents Chapter 1: About disaster recovery....................................................... 3 Disk failure............................................................................................................................... 4 Node failure............................................................................................................................. 5 Cluster replacement.............................................................................................................6 Mount a NAS file system........................................................................................ 7 Set up a disaster recovery configuration......................................................... 8 Recover a cluster from a disaster..................................................................... 10 Disaster Recovery Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 2 Chapter 1: About disaster recovery About disaster recovery Topics: • Disk failure • Node failure • Cluster replacement Disaster recovery is the ability to recover from a hardware or software failure or a catastrophic event. ThoughtSpot protects you from data loss in the event of a hardware or software failure or a catastrophic event. ThoughtSpot takes snapshots of itself automatically at periodic intervals. These can be pulled out as backups at intervals or manually as needed. See the ThoughtSpot Administrator Guide for details on backups, snapshots and restore operations. The information here addresses disaster recovery specifically. These are some potential types of failure, listed in increasing order of severity: • Disk failure • Node failure • Cluster replacement ThoughtSpot supports recovery from disk or node failure within each appliance. You can also architect your system to support loss of an entire appliance, which is the highest level of disaster recovery. Disaster Recovery Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 3 About disaster recovery Disk failure ThoughtSpot uses replication of stored data. When a disk goes bad, ThoughtSpot continues to operate. Replacement of a bad disk should be initiated through ThoughtSpot Support in this event, at your earliest convenience. Symptoms You should suspect disk failure if you observe these symptoms: • Performance degrades significantly. • You receive alert emails beginning with WARNING or CRITICAL that contain DISK_ERROR in the subject. If you notice these symptoms, contact ThoughtSpot Support. Disk replacement The guidelines for disk replacement are: • Losing one or two disks: The cluster continues to operate, but you should replace the disk(s) at the earliest convenience. • Losing more than two disks: The cluster continues to operate, but the application may be inaccessible. Replace the disks to restore original operation. Disk replacement is done on site by ThoughtSpot Support. Disks can be replaced while ThoughtSpot is running. However the disk replacement procedure involves a node restart, so a disruption of up to five minutes can happen, depending on what services are running on that node. Disaster Recovery Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 4 About disaster recovery Node failure ThoughtSpot uses replication of stored data. When a disk goes bad, ThoughtSpot continues to operate. To support high availability, your ThoughtSpot instance must have at least three nodes. In a three or more node system, if one node fails, its services will be distributed to the other nodes. The failover is automatic. However, when a node fails, you should contact ThoughtSpot Support about replacing the node when possible. A node is considered to have failed when one or more of these conditions occur: • Two or more disks have failed. • SSD has failed. • Memory failure. • Another hardware component has failed (networking, motherboard, power supplies). Symptoms You should suspect node failure if you observe these symptoms: • Performance degrades significantly. • You receive alert emails beginning with WARNING or CRITICAL, that describe problems with one of the nodes not running. • A node does not come up upon booting or rebooting the system. If you notice these symptoms, contact ThoughtSpot Support. Node replacement Node replacement is done on site by ThoughtSpot Support. You will need to schedule a maintenance window, since some downtime is required. For more information, please contact ThoughtSpot Support. Disaster Recovery Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 5 About disaster recovery Cluster replacement Cluster replacement can be achieved using a mirrored system architecture. This allows you to recover an entire system very quickly without data loss. You have the option of architecting your system for fast recovery from a disaster in which you lose an entire ThoughtSpot instance. This involves running two ThoughtSpot appliances in a mirrored configuration. This configuration is used in mission critical systems or for business processes in which ThoughtSpot data has been operationalized. The two ThoughtSpot instances are called: • Primary: The production ThoughtSpot instance. • Mirror: A standby instance that can be placed into service in the event that the primary fails. In this configuration, the primary initiates periodic full backups of itself. It pushes the backups to a shared NAS (network attached storage). The mirror instance pulls the backups from the shared NAS at defined intervals. It uses each new backup to restore itself to match the production cluster. Disaster Recovery Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 6 About disaster recovery Figure 1: A ThoughtSpot disaster recovery configuration Mount a NAS file system Some operations, like backup/restore and data loading, require you to either read or write large files. You can mount a NAS (network attached storage) file system for these operations. This procedure shows you how to mount a NAS file system for storing or accessing large files. The file system will be mounted at the same location on each node in the cluster automatically. When any node is restarted, the file system will be mounted again automatically, if it can be found. When supplying a directory for writing or reading a backup, you can specify the mountpoint as the directory to use. Likewise, you can stage data there for loading. Note that backups are written by the Linux user "admin". If that user does not have permission to write to the NAS file system, you could write the backups to Disaster Recovery Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 7 About disaster recovery disk (for example /export/sdc1, /export/sdd1, /export/sde1, or /export/sdf1) and then set up a cron job that executes as root user and copies the backup to the NAS device every night, then deletes it from the directory. Do not send the periodic backups or stage files on /export/sdb1 since it is a name node. It is used internally by Hadoop Distributed File System (HDFS) and if this drive fills up, it can cause serious problems. Do not allow backups or data files to accumulate on ThoughtSpot. If disk space becomes limited, the system will not function normally. 1. Log in to the Linux shell using SSH. 2. Mount the directory to the file system, by issuing the appropriate command: • For an NFS (Network File System) directory: tscli nas mount-nfs --server--path_on_server --mount_point • For a CIFS (Common Internet File System) directory: tscli nas mount-cifs --server --path_on_server --mount_point --username --password --uid --gid 3. Use the mounted file system as you wish, specifying it by referring to its mount point. 4. When you are finished with it, you may optionally unmount the NAS file system: tscli nas unmount --dir Set up a disaster recovery configuration Use this procedure to set up a disaster recovery configuration with a primary and a mirror instance. Disaster Recovery Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 8 About disaster recovery The disaster recovery setup uses periodic backups from the primary to a shared storage (NFS mounted drive or NAS). Note, if you do not use the tscli command to mount NAS, please make sure the NAS is mounted on all nodes of both clusters. When choosing times and frequencies for periodic backups, you should choose a reasonable frequency. Do not schedule backups too close together, since a backup cannot start when another backup is still running. Avoid backing up when the system is experiencing a heavy load, such as peak usage or a large data load. This is the procedure for designating a primary and a mirror ThoughtSpot instance. 1. Ensure you have tscli on the target appliance. If not, please contact ThoughtSpot Support. In addition, the appliance should not be running a cluster, so if one exists, please contact ThoughtSpot Support to delete the cluster. A ThoughtSpot cluster should be up and running on the source appliance. 2. Log in to the primary appliance, and set it up to take periodic backups and write them to the shared backup directory (in a SAN or shared NFS-mounted drive): $ tscli backup set-periodic --at --directory [--num_backups ] For example: $ tscli backup set-periodic --at 01,17 --directory /mnt/thoughtspot_backups --num_backups 5 3. Run tscli backup periodic-config to check whether the periodic backup is set correctly. 4. Designate the mirror appliance cluster to act as a mirror: $ tscli backup start-mirror Disaster Recovery Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 9 About disaster recovery For example: $ tscli backup start-mirror /mnt/thoughtspot_backups 192.168.2.111, 192.168.2.112,192.168.2.113 thoughtspot_mirror thoughtspot-mirror-29543 5. It may take some time for the cluster to begin acting as a mirror. Issue the command to verify that the cluster has started running in mirror mode: $ tscli backup mirror-status You can use this command in the future to check whether the mirror cluster is up to date. Recover a cluster from a disaster If the primary cluster fails, the mirror cluster can take over its operations after a small manual intervention. The manual procedure makes the mirror instance into the primary. In the event that the production cluster is destroyed, monitoring and alerting will notify the administrator. The administrator can then make the mirror into the new primary, by stopping it from pulling backups generated by the old primary. These steps define the mirror cluster as the new primary. Then a new mirror can be deployed. 1. If it is still running, disconnect the primary from the network. 2. Log in to the mirror, and issue the command to stop it from acting as a mirror: $ tscli backup stop-mirror 3. It may take some time for the mirror to finish recovering the last backup from the primary. Issue the command to verify that the cluster has stopped running in mirror mode before continuing: $ tscli backup mirror-status 4. Start up the periodic backups on the mirror (which is now the new primary): $ tscli backup set-periodic --at --directory [--num_backups ] Disaster Recovery Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 10 About disaster recovery 5. Deploy a new mirror appliance when possible. Disaster Recovery Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 11
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf Linearized : No Page Count : 11 Profile CMM Type : lcms Profile Version : 2.1.0 Profile Class : Display Device Profile Color Space Data : RGB Profile Connection Space : XYZ Profile Date Time : 1998:02:09 06:49:00 Profile File Signature : acsp Primary Platform : Apple Computer Inc. CMM Flags : Not Embedded, Independent Device Manufacturer : IEC Device Model : sRGB Device Attributes : Reflective, Glossy, Positive, Color Rendering Intent : Perceptual Connection Space Illuminant : 0.9642 1 0.82491 Profile Creator : lcms Profile ID : 0 Profile Copyright : Copyright (c) 1998 Hewlett-Packard Company Profile Description : sRGB IEC61966-2.1 Media White Point : 0.95045 1 1.08905 Media Black Point : 0 0 0 Red Matrix Column : 0.43607 0.22249 0.01392 Green Matrix Column : 0.38515 0.71687 0.09708 Blue Matrix Column : 0.14307 0.06061 0.7141 Device Mfg Desc : IEC http://www.iec.ch Device Model Desc : IEC 61966-2.1 Default RGB colour space - sRGB Viewing Cond Desc : Reference Viewing Condition in IEC61966-2.1 Viewing Cond Illuminant : 19.6445 20.3718 16.8089 Viewing Cond Surround : 3.92889 4.07439 3.36179 Viewing Cond Illuminant Type : D50 Luminance : 76.03647 80 87.12462 Measurement Observer : CIE 1931 Measurement Backing : 0 0 0 Measurement Geometry : Unknown Measurement Flare : 0.999% Measurement Illuminant : D65 Technology : Cathode Ray Tube Display Red Tone Reproduction Curve : (Binary data 2060 bytes, use -b option to extract) Green Tone Reproduction Curve : (Binary data 2060 bytes, use -b option to extract) Blue Tone Reproduction Curve : (Binary data 2060 bytes, use -b option to extract) Format : application/pdf Language : en Date : 2017:02:15 22:40:12-08:00 Producer : Apache FOP Version 1.1 PDF Version : 1.4 Creator Tool : Apache FOP Version 1.1 Metadata Date : 2017:02:15 22:40:12-08:00 Create Date : 2017:02:15 22:40:12-08:00 Page Mode : UseOutlines Creator : Apache FOP Version 1.1EXIF Metadata provided by EXIF.tools