Thought Spot Administration Guide 4.2

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 320 [warning: Documents this large are best viewed by clicking the View PDF Link!]

ThoughtSpot
Administration Guide
Version 4.2
February 2017
Table of Contents
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 2
Contents
Chapter 1: Installation and setup............................................................ 8
Login credentials for administration........................................................................... 10
Log in to the Linux shell using SSH............................................................................ 10
Log in to ThoughtSpot from a browser......................................................................11
Set your ThoughtSpot locale..........................................................................................12
Software updates.................................................................................................................12
Test network connectivity between nodes............................................................... 12
Set the relay host for SMTP (email)............................................................................ 13
Verify that email is working.................................................................................13
Set up a fiscal calendar year..........................................................................................14
About SSL (secure socket layers)................................................................................ 15
Configure SSL for web traffic............................................................................. 15
Set the recommended TLS version.................................................................. 16
Configure SAML................................................................................................................... 17
About LDAP integration................................................................................................... 18
Configure OpenLDAP............................................................................................. 18
Configure LDAP for Active Directory............................................................. 20
Add the SSL certificate for LDAP.....................................................................22
Test the LDAP configuration...............................................................................23
Sync users and groups from LDAP..................................................................23
Mount a NAS file system.................................................................................................25
Add a custom support contact.................................................................................... 27
Set up monitoring..............................................................................................................29
Connect to the ThoughtSpot Support file server................................................. 30
Set up remote support access.......................................................................................31
Enable the call home capability...................................................................................32
Network ports......................................................................................................................33
Table of Contents
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 3
About load balancing and proxies............................................................................. 40
Chapter 2: Load and manage data.....................................................43
About case configuration............................................................................................... 46
Generate CSV files with the data to be loaded..................................................... 46
Load data from a web browser....................................................................................47
Append data from a web browser............................................................................. 50
Plan the schema..................................................................................................................52
Data types..................................................................................................................54
Constraints..................................................................................................................57
Sharding......................................................................................................................60
Chasm traps...............................................................................................................63
Chasm trap limitations..........................................................................................64
Build the schema................................................................................................................66
About TQL, the SQL command line interface..............................................67
Connect to the database with the ThoughtSpot SQL Command Line
(TQL)....................................................................................................................... 67
Create the schema in TQL.................................................................................. 68
Write a SQL script to create the schema......................................................69
Schema creation examples................................................................................. 70
Import a schema (use the SQL editor).......................................................... 73
Change the schema...........................................................................................................75
Change the primary key for a table................................................................ 75
Change a relationship between tables........................................................... 76
Change sharding on a table............................................................................... 78
About data type conversion...............................................................................79
Load data with ThoughtSpot Loader.........................................................................83
Import CSV files with ThoughtSpot Loader.................................................84
Use a script to load data.....................................................................................85
Bulk load files in parallel......................................................................................86
Table of Contents
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 4
Delete a data source.........................................................................................................88
Delete a data source from the browser.........................................................89
Delete or change a table in TQL...................................................................... 92
About the Schema Viewer............................................................................................. 93
Chapter 3: Model, link, and tag your data for searching............. 97
Model the data for searching........................................................................................98
Model data in the ThoughtSpot application................................................ 98
Model data in bulk in the modeling file....................................................... 100
Data modeling settings.......................................................................................104
Link tables using relationships.................................................................................... 123
Create a relationship............................................................................................ 124
Delete a relationship.............................................................................................128
About stickers.................................................................................................................... 129
Create stickers........................................................................................................ 130
Apply a sticker.........................................................................................................131
Filter by a sticker...................................................................................................132
Chapter 4: Simplify searching with worksheets........................... 134
Create a new worksheet................................................................................................ 137
Add sources and columns to a worksheet.................................................. 137
How the inclusion rule works............................................................................ 141
How the worksheet join rule works................................................................143
About formulas in worksheets.........................................................................150
Edit a worksheet............................................................................................................... 153
Rename a worksheet or table.......................................................................... 154
Change the inclusion or join rule for a worksheet....................................155
Delete a worksheet or table.........................................................................................157
Chapter 5: Manage users, groups, and privileges........................160
Table of Contents
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 5
About privileges................................................................................................................ 162
Add a group and set security privileges.................................................................163
Edit or delete a group....................................................................................................166
Add a user........................................................................................................................... 167
Add multiple users to a group....................................................................................169
Edit or delete a user....................................................................................................... 170
Forgotten password..........................................................................................................171
Chapter 6: Manage jobs......................................................................... 172
About scheduled pinboards......................................................................................... 173
Schedule a pinboard job.....................................................................................175
Scheduled pinboards management................................................................178
Chapter 7: About security.....................................................................182
System security..................................................................................................................183
Get audit logs..........................................................................................................183
Security policies..................................................................................................... 185
Data security.......................................................................................................................186
Share tables and columns.................................................................................. 187
Share worksheets.................................................................................................. 190
Share a pinboard....................................................................................................192
Revoke access (unshare)....................................................................................194
Row level security................................................................................................. 196
Network security............................................................................................................... 212
Chapter 8: System administration...................................................... 213
System monitoring........................................................................................................... 214
About the Space Utilization chart................................................................. 220
Generate and send a log bundle............................................................................... 222
Send logs to the administrator...................................................................................222
Table of Contents
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 6
Set up recording for Replay Search.........................................................................223
Chapter 9: Backup and restore.......................................................... 228
About backups..................................................................................................................230
Take a snapshot.....................................................................................................232
Configure periodic snapshots.......................................................................... 233
Take a backup of a snapshot...........................................................................235
Configure periodic backups............................................................................. 236
About restore operations..............................................................................................238
Chapter 10: About troubleshooting................................................. 240
Get logs.................................................................................................................................241
Upload logs to ThoughtSpot Support......................................................... 243
Network connectivity issues....................................................................................... 244
Change the timezone.....................................................................................................245
Browser untrusted connection error........................................................................245
Characters not displaying correctly.........................................................................246
Clear the browser cache...............................................................................................247
Cannot open a saved answer that contains a formula..................................... 249
Data loading too slowly................................................................................................. 251
Search results contain too many blanks.................................................................252
Chapter 11: Reference............................................................................. 254
TQL reference....................................................................................................................255
ThoughtSpot Loader flag reference.........................................................................266
tscli command reference.............................................................................................. 270
Formula reference........................................................................................................... 290
Date and time formats reference..............................................................................305
Row level security rules reference............................................................................307
Table of Contents
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 7
Contact ThoughtSpot..............................................................................317
Open source software.............................................................................319
Copyright.....................................................................................................320
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 8
Chapter 1: Installation and setup
Installation and setup
Topics:
Login credentials for
administration
Log in to the Linux shell
using SSH
Log in to ThoughtSpot
from a browser
Set your ThoughtSpot
locale
Software updates
Test network connectivity
between nodes
Set the relay host for
SMTP (email)
Set up a fiscal calendar
year
About SSL (secure socket
layers)
Configure SAML
About LDAP integration
Mount a NAS file system
Add a custom support
contact
Set up monitoring
This ThoughtSpot Administrator Guide will walk
you through the basic steps required to set up and
configure ThoughtSpot. It will also assist you in
troubleshooting some common problems, finding
additional resources, and contacting ThoughtSpot.
ThoughtSpot enables you to access and analyze
your data through a search-based user interface. You
can create your searches on the fly by typing into a
search bar, like you do when using an internet search
engine. ThoughtSpot makes it easy to see your data,
get your questions answered, create interactive
graphs, and customize pinboards. You do not need to
understand how the data is stored or know SQL to do
these things.
ThoughtSpot gives administrators the ability to
modify data properties to meet business needs, for
example by providing search synonyms for common
terms, boosting the importance of a column in
search results, or formatting how the data appears.
Collaboration and security features make it easy for
you to protect sensitive data and for users to share
information safely with others.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 9
Connect to the
ThoughtSpot Support file
server
Set up remote support
access
Enable the call home
capability
Network ports
About load balancing and
proxies
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 10
Login credentials for administration
You will need administrative permissions to perform the actions discussed in this
guide. You can access ThoughtSpot via SSH at the command prompt and from a
Web browser.
There are two separate default administrator users, an operating system user
that you type in at the Linux shell prompt, and an application user for access
through a browser. Make sure you use the correct login and password for the
method you are using to log in. Passwords are case sensitive.
Table 1: Default administrative user credentials
Login Type User Access Method Password
OS user admin Access remotely via SSH from
the command prompt on a
client machine.
Contact ThoughtSpot to obtain
the default password.
Application user tsadmin Access through a Web
browser.
Contact ThoughtSpot to obtain
the default password.
Log in to the Linux shell using SSH
To perform basic administration such as checking network connectivity, starting
and stopping services, and setting up email, log in remotely as the Linux
administrator user "admin". To log in with SSH from a client machine, you can
use the command shell or a utility like Putty.
In the following procedure, replace <hostname_or_IP> with the hostname or IP
address of a node in ThoughtSpot. The default SSH port (22) will be used.
1. Log in to a client machine and open a command prompt.
2. Issue the SSH command, specifying the IP address or hostname of the
ThoughtSpot instance:
ssh admin@<hostname_or_IP>
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 11
3. Enter the password for the admin user.
Log in to ThoughtSpot from a browser
To set up and explore your data, access ThoughtSpot from a standard Web
browser using a username and password.
Before accessing ThoughtSpot, you need:
The Web address (IP address or server name) for ThoughtSpot.
A network connection.
A Web browser.
A username and password for ThoughtSpot.
Supported Web browsers include:
Table 2: Supported browsers
Browser Version Operating System
Google Chrome 20 and above Windows 7 or greater
• Linux
• MacOS
Mozilla Firefox 14 and above Windows 7 or greater
• Linux
• MacOS
Internet Explorer 11 Windows 7 or greater
To log in to ThoughtSpot from a browser:
1. Open the browser and type in the Web address for ThoughtSpot:
http://<hostname_or_IP>
2. Enter your username and password and click Enter Now.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 12
Set your ThoughtSpot locale
In addition to American English, ThoughtSpot also supports German and
Japanese.
The language displayed in ThoughtSpot is based off of your browser locale. So
if you set Japanese as your default language in your browser settings, then the
interface will update to reflect that after you refresh your page.
Keywords, operators, and error messages are included in the translated material.
Formulas, however, are not translated. Also, all metadata will remain as user
inputted.
This feature is supported on all browsers that support ThoughtSpot.
To set your ThoughtSpot locale:
1. Go to the settings page of your browser.
2. Change your default language to one that is supported by ThoughtSpot.
3. Save your settings.
4. Refresh your ThoughtSpot browser page.
Your ThoughtSpot interface should reflect your new chosen language.
Software updates
The ThoughtSpot software is updated by ThoughtSpot Support.
ThoughtSpot Support will contact you to schedule an update when one
becomes available.
Test network connectivity between nodes
This procedure tests the network connectivity between the ThoughtSpot nodes,
and to the LAN. If you can perform these steps successfully, the network settings
on ThoughtSpot are correct.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 13
1. Log in to the Linux shell using SSH.
2. Ping each of the other nodes in the cluster.
3. Ping another machine that exists outside of the cluster, for example, a
machine that you will use to stage data to be loaded.
Note: If you cannot perform these tests successfully, there is a problem
with the network setup.
4. If the tests fail, check Network connectivity issues.
Set the relay host for SMTP (email)
To enable alert emails, you'll need to set up a relay host for SMTP traffic
from ThoughtSpot. This routes the alert and notification emails coming from
ThoughtSpot through an SMTP email server.
To set up a relay host:
1. Log in to the Linux shell using SSH.
2. Issue the setup command, providing the IP address of the relay host:
$ tscli smtp set-relayhost <IP_address>
3. Verify your settings:
$ tscli smtp show-relayhost
4. Verify that email is working.
Verify that email is working
Check if the email settings are working properly by using this procedure.
ThoughtSpot uses emails for sending critical notifications to ThoughtSpot
Support.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 14
ThoughtSpot sends alerts to the email address specified during installation. If no
email address was entered, no alerts will be sent. But you can add an email to
receive alerts by issuing:
$ tscli monitoring set-config --email <your_email>
You can add a list of email addresses separated by commas, with no spaces.
To verify that the send email function is working correctly:
1. Log in to the Linux shell using SSH.
2. Try sending an email to yourself by issuing:
$ echo | mail -s Hello <your_email>
3. If you receive the email at the address you supplied, email is working correctly.
Set up a fiscal calendar year
Many companies start their fiscal calendar in a month other than January. If
this is the case in your company, setting a fiscal calendar quarter makes the
ThoughtSpot date searches reflect your fiscal year.
When you set a custom fiscal year, you will designate the month on which your
company's fiscal year begins. All the date language will then reflect your change,
so if someone searches for this quarter or q3, the answer will conform to the
fiscal quarter in use. When you make this change, existing pinboards also change
to reflect the custom fiscal calendar. Because of this, if you make this change
after your users have been using ThoughtSpot for any period of time, you should
alert them of the change you will be making and how it affects previous saved
searches.
Call ThoughtSpot Support, so they can help you set the custom fiscal year.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 15
About SSL (secure socket layers)
You should use SSL (secure socket layers) for sending data to and from
ThoughtSpot. SSL provides authentication and data security. This section applies
to both SSL to enable secure HTTP and secure LDAP.
Many IT departments require SSL for their applications that access data. To use
SSL with ThoughtSpot, you'll need your company's own SSL certificate. The
certificate is issued per domain, so if you want to use SSL for both HTTP and
LDAP, you will need two separate certificates - one for the HTTP domain and one
for the LDAP domain.
If you do not have an SSL certificate:
Check with your IT department to see if they already have an SSL certificate
you can use.
If not, you will need to obtain the certificate from an issuing authority.
Alternatively, you may disable SSL if you don't want the security it provides by
using the command tscli ssl off.
There are many SSL vendors to choose from. Check with your existing Web
hosting provider first, to see if they can provide the certificate for you.
When you apply for the SSL certificate, you may specify a SAN, wildcard, or
single domain certificate. Any of these can work with ThoughtSpot.
Configure SSL for web traffic
This procedure shows how to add SSL (secure socket layers) to enable secure
HTTP (HTTPS) in ThoughtSpot.
To set up SSL, you will need:
The SSL certificate
The private key
To install the SSL certificate:
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 16
1. Follow the instructions from your certifying authority to obtain the certificate.
This is usually sent via email or available by download.
2. Copy the certificate and key files to ThoughtSpot:
$ scp <key> <certificate> admin@<IP_address>:<path>
3. Log in to the Linux shell using SSH.
4. Change directories to where you copied the certificate:
$ cd <path>
5. Issue the tscli command to install the certificate:
$ tscli ssl add-cert <key> <certificate>
6. To test that the certificate was installed correctly, Log in to ThoughtSpot from
a browser.
You should see that the URL begins with https://.
Set the recommended TLS version
There are a couple of security vulnerabilities due to SSL certificates supporting
older versions of TLS (Transport Layer Security). This procedure shows you how
to set the recommended TLS version to avoid these vulnerabilities.
The PCI (Payment Card Industry) Data Security Standard and the FIPS 140-2
Standard require a minimum of TLS v1.1 and recommends TLS v1.2.
ThoughtSpot supports SSL v3, TLS v1.0, and TLS v1.1 for backwards compatibility.
However, the recommended version is TLS v1.2. Therefore, to set the
recommended TLS version:
1. Enable your web browser to support TLS v1.2. This can be done in your
browser's advanced settings.
2. Log in to the Linux shell using SSH.
3. Issue the following command:
tscli security set-min-version 1.2
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 17
This will block all usage of older versions.
Configure SAML
ThoughtSpot can use Security Assertion Markup Language (SAML) to
authenticate users. You can set up SAML through the shell on ThoughtSpot using
a tscli based configurator.
Before configuring SAML, you will need this information:
IP of the server where your ThoughtSpot instance is running.
Port of the server where your ThoughtSpot instance is running.
Protocol, or the authentication mechanism for ThoughtSpot.
Unique service name that is used as the unique key by IDP to identify the
client.
It should be in the following format: urn:thoughtspot:callosum:saml
Allowed skew time, which is the time after authentication response is rejected
and sent back from the IDP. It is usually set to 86400.
The absolute path to the idp-meta.xml file. This is needed so that the
configuration persists over upgrades.
This configurator also checks with the user if internal authentication needs to
be set or not. This internal authentication mechanism is used to authenticate
tsadmin, so set it to true if you do not know what it does.
Use this procedure to set up SAML on ThoughtSpot for user authentication. Note
that this configuration persists across software updates, so you do not need to
reapply it if you update to a newer release of ThoughtSpot.
1. Log in to the Linux shell using SSH.
2. Execute the command to launch the interactive SAML configuration:
tscli saml configure
3. Complete the configurator prompts with the information you gathered above.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 18
4. When the configuration is complete, open a Web browser and go to the
ThoughtSpot login page. It should now show the Single Sign On option.
About LDAP integration
Some companies use LDAP (Lightweight Directory Access Protocol) to manage
user authentication. Using LDAP provides security and makes user management
more centralized.
ThoughtSpot can be configured to authenticate users against an LDAP
server. You can choose to authenticate users against an LDAP server, against
ThoughtSpot, or both.
ThoughtSpot supports both anonymous and non-anonymous LDAP integration.
Non-anonymous LDAP binding is more rigorous than anonymous authentication,
but it should help you track what your users are querying and keep a log trace
for auditing purposes.
If you have been using ThoughtSpot with users you created manually, and you
now want to transition to LDAP, please contact ThoughtSpot Support. They can
assist you in migrating existing users to their LDAP equivalents.
ThoughtSpot supports these types of LDAP servers:
OpenLDAP
Active Directory
Configure OpenLDAP
Use this procedure to set up integration with LDAP using OpenLDAP.
Before configuring OpenLDAP, you will need this information:
URL to connect to OpenLDAP
For example, ldap://192.168.2.48:389
Distinguished Name template
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 19
The template for usernames, for example cn={0},ou=users,dc=thoughtspot,dc=com
Automatically add LDAP users in ThoughtSpot?
If you choose 'yes' for this, when a user is authenticated against LDAP, if that
user does not exist in ThoughtSpot, the user is automatically created. When
users are created in this way, their passwords exist only in LDAP and are not
stored in ThoughtSpot.
If you choose 'no' for this, users who will authenticate against LDAP have to
be manually created with a dummy password as a placeholder in ThoughtSpot
before they can log in. In order to log in to ThoughtSpot, the user has to exist
in ThoughtSpot independent of whether that user is authenticated against
LDAP or against ThoughtSpot's internal authentication.
Also use ThoughtSpot internal authentication?
If you choose 'yes' for this, when a user logs in, ThoughtSpot will first attempt
to authenticate the user against LDAP. If that attempt fails, it will then attempt
to authenticate the user against ThoughtSpot. If either of these succeed, then
the user is successfully logged in. This option is useful in scenarios where
some users are not in LDAP and are created only in ThoughtSpot.
You do not need to create a user called tsadmin on your LDAP server. Internal
authentication can be used for tsadmin. To configure LDAP for OpenLDAP:
1. Log in to the Linux shell using SSH.
2. Run the command to configure LDAP:
$ tscli ldap configure
3. Answer the prompts using the information you collected. For example:
Choose the LDAP protocol:
[1] Active Directory
[2] OpenLDAP
Option number: 2
Configuring Open LDAP
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 20
URL to connect to OpenLDAP (Example: ldap://192.168.2.100:389):
ldap://192.168.2.48:389
Distinguished name template (Example: cn={0},ou=users,dc=thoughtspot,dc=com):
cn={0},ou=users,dc=thoughtspot,dc=com
Automatically add LDAP users in ThoughtSpot (y/n): n
Also use ThoughtSpot internal authentication (y/n): y
4. If you are using SSL, Add the SSL certificate for LDAP.
5. If you want to remove the LDAP configuration, issue:
$ tscli ldap purge-configuration
Configure LDAP for Active Directory
Use this procedure to set up integration with LDAP using Active Directory.
Before you configure LDAP for Active Directory, collect this information:
URL to connect to Active Directory.
For example, ldap://192.168.2.48:389
Default LDAP domain.
The default domain is the domain under which users who want to be
authenticated against Active Directory reside. When a user logs in with a
username, the default domain is added to the username before sending it to
the LDAP server. If users reside in multiple domains, you can still designate
one of them as the default. Users belonging to a non-default domain will
have to explicitly qualify their username when they log in, for example:
username@ldap1.thoughtspot.com.
Whether you will use SSL.
If yes, you'll need the certificate from the issuing authority.
LDAP search base.
This prompt adds the search base information that allows ThoughtSpot to find
user properties such as email and displayname from LDAP.
Automatically add LDAP users in ThoughtSpot?
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 21
If you choose 'yes' for this, when a user is authenticated against LDAP, if that
user does not exist in ThoughtSpot, then the user is automatically created.
When users are created in this way, their passwords exist only in LDAP and are
not stored in ThoughtSpot.
In order to log in to ThoughtSpot, the user has to exist in ThoughtSpot
independent of whether that user is authenticated against LDAP or against
ThoughtSpot's internal authentication. If you choose 'no' for this, users
who will authenticate against LDAP have to be manually created with a
dummy password as a placeholder in ThoughtSpot before they can log
in. The username you specify when creating the LDAP authenticated
user manually in ThoughtSpot has to be domain qualified, for example:
username@ldap1.thoughtspot.com.
Also use ThoughtSpot internal authentication?
If you choose 'yes' for this, when a user logs in, ThoughtSpot will first attempt
to authenticate the user against LDAP. If that attempt fails, it will then attempt
to authenticate the user against ThoughtSpot. If either of these succeed, then
the user is successfully logged in. This option is useful in scenarios where
some users are not in LDAP and are created only in ThoughtSpot.
You do not need to create a user called tsadmin on your LDAP server. Internal
authentication can be used for tsadmin. To configure LDAP for OpenLDAP:
1. Log in to the Linux shell using SSH.
2. Run the command to configure LDAP:
$ tscli ldap configure
3. Answer the prompts using the information you collected. For example:
Choose the LDAP protocol:
[1] Active Directory
[2] OpenLDAP
Option number: 1
Configuring Active Directory
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 22
URL to connect to Active Directory. (Example: ldap://192.168.2.100:389):
ldap://192.168.2.100:389
Default domain (Example: ldap.thoughtspot.com): ldap.thoughtspot.com
Use SSL (LDAPS) (y/n): n
LDAP search base (Example: cn=Users): cn=Users
Automatically add LDAP users in ThoughtSpot (y/n): y
Also use ThoughtSpot internal authentication (y/n): y
4. If you are using SSL, Add the SSL certificate for LDAP.
5. If you want to remove the LDAP configuration, issue:
$ tscli ldap purge-configuration
Add the SSL certificate for LDAP
When you set up LDAP, you specified whether or not to use SSL for LDAP
(LDAPS). If using SSL, you must install the LDAP SSL certificate.
Before you can add the SSL certificate, you must LDAP using one of these
procedures:
Configure OpenLDAP
Configure LDAP for Active Directory
You must have the SSL certificate before you start. For more information on
obtaining an SSL certificate, see About SSL (secure socket layers).
To add the SSL certificate for LDAP:
1. Follow the instructions from your certifying authority to obtain the certificate.
This is usually sent via email or available by download.
2. Copy the certificate to ThoughtSpot:
$ scp <certificate> admin@<IP_address>:<path>
3. Log in to the Linux shell using SSH.
4. Change directories to where you copied the certificate:
$ cd <path>
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 23
5. Run the command to configure SSL for LDAP, designating an alias for this
certificate using the <name> parameter:
$ tscli ldap add-cert <name> <certificate>
Test the LDAP configuration
After configuring LDAP, you can test to make sure it is working by issuing a
command.
This procedure allows you to test the LDAP connection you created.
1. Log in to the Linux shell using SSH.
2. Issue the LDAP testing command, supplying the information for the LDAP
server you configured, as in this example:
$ ldapsearch -x -h 192.168.2.61 -p 389 -D "testuser@ldap.thoughtspot.com" -W
-b "dc=ldap,dc=thoughtspot,dc=com" cn
3. Supply the LDAP password when prompted.
4. If the connection works, you'll see a confirmation message.
Sync users and groups from LDAP
Use this procedure to synchronize your ThoughtSpot system with an LDAP
server.
Before synchronizing users and groups, you will need this information:
IP address and port of the server where your ThoughtSpot instance is running.
This hostport is needed in the following format http(s)://<host>:<port> or
http(s)://<domain>.
Administrator login username and password for your ThoughtSpot instance.
URL of the LDAP server, or hostport.
For example, ldap://192.168.2.48:389
Login username and password for the LDAP system.
An example username would be moo_100@ldap.thoughtspot.com
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 24
Distinguished Name (DN) for the base to start searching for users in the LDAP
system.
For example, DC=ldap,DC=thoughtspot,DC=com
There are two ways for you to fetch users and groups from LDAP and populate
them into your ThoughtSpot system:
Run the synchronization script in interactive mode, which will walk you
through the process (shown here).
Create your own Python script by using the ThoughtSpot Python APIs. If you
need details on the Python APIs, contact ThoughtSpot Support. If you choose
this method, you can run the script periodically using a cron job.
To run the LDAP sync script in interactive mode:
1. Log in to the Linux shell using SSH.
2. Run the command to start the script:
python syncUsersAndGroups.py interactive
3. Answer the prompts using the information you collected above. For example:
Complete URL of TS server in format "http(s)://<host>:<port>":
http://10.77.145.24:8088
Disable SSL authentication to TS server (y/n): y
Login username for ThoughtSpot system: admin
Login password for ThoughtSpot system: 12345
Complete URL of server where LDAP server is running in format ldap(s)://
<host>:<port>: ldap://192.168.2.48:389
Login username for LDAP system: moo_100@ldap.thoughtspot.com
Login password for LDAP system: 12345
Syncs user and groups between LDAP and TS systems (y/n): y
Delete entries in ThoughtSpot system that are not currently in LDAP tree
being synced (y/n): n
Distinguished name for the base to start searching groups in LDAP System:
DC=ldap,DC=thoughtspot,DC=com
Scope to limit the search to (choice number)
0:base Searching only the entry at the base DN
1:one Searching all entries on level under the base DN - but not including
the base DN
2:tree Searching of all entries at all levels under and including the
specified base DN: 2
Filter string to apply the search to: (|(CN=TestGroupAlpha)
(CN=TestGroupBeta))
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 25
Answering this prompt is optional. If left blank, the default value of '(CN=*)'will
be used.
Apply sync recursively, i.e. Iterates through group members and creates
member groups, users and relationships in a recursive way. (y/n): n
This prompt is asking if you would like to include group members even if they
do not belong to the current sub tree that is being synced.
4. Alternatively, to input your own shorthand script commands:
1. Issue the Python script commands, supplying all of the above information,
following this format example:
python syncUsersAndGroups.py script \
–-ts_hostport <ts_hostport> \
--disable_ssl \
--ts_uname <ts_username> \
--ts_pass <ts_password> \
--ldap_hostport '<ldap_hostport>' \
--ldap_uname '<ldap_username>' \
--ldap_pass '<ldap_password>' \
--sync \
--purge \
--basedn 'DC=ldap,DC=thoughtspot,DC=com' \
--filter_str '(|(CN=TestGroupAlpha)(CN=TestGroupBeta))' \
--include_nontree_members
The bottom half of the above command example targets sub trees under
the DC called TestGroupAlpha and TestGroupBeta, and iterates through
them recursively to create/sync users, groups, and their relationships in
the ThoughtSpot system. It also deletes any other entities created in the
ThoughtSpot system from this LDAP system that are not currently being
synced.
Mount a NAS file system
Some operations, like backup/restore and data loading, require you to either
read or write large files. You can mount a NAS (network attached storage) file
system for these operations.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 26
This procedure shows you how to mount a NAS file system for storing or
accessing large files. The file system will be mounted at the same location on
each node in the cluster automatically. When any node is restarted, the file
system will be mounted again automatically, if it can be found.
When supplying a directory for writing or reading a backup, you can specify
the mountpoint as the directory to use. Likewise, you can stage data there for
loading.
Note that backups are written by the Linux user "admin". If that user does not
have permission to write to the NAS file system, you could write the backups to
disk (for example /export/sdc1, /export/sdd1, /export/sde1, or /export/sdf1) and
then set up a cron job that executes as root user and copies the backup to the
NAS device every night, then deletes it from the directory.
Do not send the periodic backups or stage files on /export/sdb1 since it is a
name node. It is used internally by Hadoop Distributed File System (HDFS) and
if this drive fills up, it can cause serious problems. Do not allow backups or data
files to accumulate on ThoughtSpot. If disk space becomes limited, the system
will not function normally.
1. Log in to the Linux shell using SSH.
2. Mount the directory to the file system, by issuing the appropriate command:
For an NFS (Network File System) directory:
tscli nas mount-nfs
--server <server_NFS_address>
--path_on_server <path>
--mount_point <target>
For a CIFS (Common Internet File System) directory:
tscli nas mount-cifs
--server <server_CIFS_address>
--path_on_server <path>
--mount_point <target>
--username <user>
--password <password>
--uid <uid>
--gid <gid>
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 27
3. Use the mounted file system as you wish, specifying it by referring to its
mount point.
4. When you are finished with it, you may optionally unmount the NAS file
system:
tscli nas unmount --dir <directory>
Add a custom support contact
You can designate a support contact (email and phone number) at your
organization for first level technical support. That person can answer questions
about data and searching, and submit any system and software-related
questions to ThoughtSpot Support.
After you set the custom support contact information, here's where your users
will see it:
In the Help Center, when they click Contact Support.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 28
Figure 1: Help Center support contact
In error messages, when they click What Happened?
Figure 2: Error message support contact
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 29
To designate the custom support contact:
1. Log in to the Linux shell using SSH.
2. Issue the tscli command to set the email address:
$ tscli support set-admin-email <email_address>
3. Issue the tscli command to set the phone number:
$ tscli support set-admin-phone <phone_number>
4. If you need to reset both of these to the default (ThoughtSpot Support), issue:
$ tscli support rm-admin-email
$ tscli support rm-admin-phone
Set up monitoring
To configure monitoring of your cluster, set up the frequency of heartbeat and
monitoring reports and an email address to receive them.
Use tscli to set up monitoring. This is a one time operation.
1. Log in to the Linux shell using SSH.
2. Issue the tscli command to set up monitoring:
tscli monitoring set-config
--email <email>
--heartbeat_interval <heartbeat_interval>
--report_interval <report_interval>
The parameters are:
--email <email> is a comma separated list (no spaces) of email addresses
where the cluster will send monitoring information.
--heartbeat_interval <heartbeat_interval> is the heartbeat email generation
interval in seconds. Must be greater than 0.
--report_interval <report_interval> sets the cluster report email generation
interval in seconds. Must be greater than 0.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 30
3. To view your settings and verify that they have been applied, issue:
tscli monitoring show-config
You should see information like:
Monitoring Configuration:
Alert Email: dev-alerts@thoughtspot.com
Heartbeat Interval: 900 sec
Report Interval: 21600 sec
4. After the heartbeat interval has passed, check your email to verify that emails
are being delivered.
5. If you don't receive any emails, Verify that email is working.
Connect to the ThoughtSpot Support file server
ThoughtSpot Support uses a secure file server to provide new releases and
to receive logs and troubleshooting files that you upload. The secure server
connection is also required if you want to enable the optional statistics collection
using the call home feature.
Before you can upload a file to the secure file server, obtain your user name
and password for logging in to the secure file server. You can get these from
ThoughtSpot Support.
Configuring the connection to the file server is a one time operation. You do not
need to reconfigure the connection unless your password changes. Note that
you can do a one time override of the user and password you used to configure
the connection. This is done by passing a different user and password on the
command line when uploading or downloading a file.
To configure the connection to the secure file server:
1. Log in to the Linux shell using SSH.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 31
2. Issue the command to configure the file server:
$ tscli fileserver configure --user <user_name> [--password <password>]
Note that if you do not use the optional --password parameter, you will be
prompted to enter the password.
Set up remote support access
You can set up a reverse tunnel to allow ThoughtSpot Support to get access to
your ThoughtSpot instance, to perform support-related activities. This setup is
a much simpler, more secure, and scalable than the alternative option of using a
virtual meeting room.
Before you can do this procedure, your networking team needs to open port 22
in your firewall outgoing rules.
Granting remote support access can streamline troubleshooting activities,
since it enables your support agent to work directly on your in a secure setting.
The remote tunnel enables SSH and HTTP access to your by ThoughtSpot
Support. This access can be granted and revoked easily, so you can enable it for
a troubleshooting session, and then disable it again. Before doing this procedure,
make sure it is allowed by your internal security policies.
To enable remote support:
1. Contact ThoughtSpot to open a support ticket for making the appropriate
reverse tunnel settings on our end. Provide the cluster name of the cluster for
which you want to enable remote support.
2. When the ticket has been completed, continue with the remaining steps in this
procedure to make the settings on your side.
3. Log in to the Linux shell using SSH.
4. Issue the command to configure the destination for the remote tunnel.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 32
You only need to do this once, when you are enabling the tunnel for the very
first time. After that, this setting persists when you start and stop the remote
tunnel.
$ tscli support set-remote --addr tunnel.thoughtspot.com --user ubuntu
5. Test that the setting has been applied:
$ tscli support show-remote
6. Enable the remote tunnel:
$ tscli support start-remote
7. Contact ThoughtSpot again, so you can test the setup with your ThoughtSpot
Support contact.
8. After your remote session with ThoughtSpot Support, you should turn the
remote tunnel off, until you need to use it again:
$ tscli support stop-remote
You can repeat the steps to start and stop the remote tunnel as needed for
future support operations.
9. Test that the remote tunnel has been disabled:
$ tscli support show-remote
Enable the call home capability
The optional "call home" capability sends usage statistics to ThoughtSpot
Support once a day via a secure file server.
Before you can enable the call home feature:
1. Configure the connection to the file server.
2. Obtain the customer name as recognized by the file server.
The customer name is formatted like this example: Shared/<customer_name>/stats. If
you do not know the customer name, contact ThoughtSpot Support.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 33
This can be helpful when troubleshooting problems with ThoughtSpot Support,
because they will be able to see basic usage information over time for your
ThoughtSpot instance.
To set up the call home feature:
1. Log in to the Linux shell using SSH.
2. Enable the call home feature by issuing:
$ tscli callhome enable --customer_name <customer_name>
3. If you want to disable call home in the future, you can do so by issuing:
$ tscli callhome disable
Network ports
For regular operations and for debugging, there are some ports you will need to
keep open to network traffic from end users. Another, larger list of ports must be
kept open for network traffic between the nodes in the cluster.
Required ports for operations and debugging
The following ports need to be opened up to requests from your user
population. There are two main categories: operations and debugging.
Table 3: Network ports to open for operations
Port Protocol Service Name Direction Source Destination Description
22 SSH SSH bidirectional Administrators
IP addresses
All nodes Secure shell access.
Also used for scp
(secure copy).
80 HTTP HTTP bidirectional All users IP
addresses
All nodes Hypertext Transfer
Protocol for website
traffic.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 34
Port Protocol Service Name Direction Source Destination Description
443 HTTPS HTTPS bidirectional All users IP
addresses
All nodes Secure HTTP.
12345 TCP Simba bidirectional Administrators
IP addresses
All nodes Port used by ODBC
and JDBC drivers
when connecting to
ThoughtSpot.
Table 4: Network ports to open for debugging
Port Protocol Service Name Direction Source Destination Description
2201 HTTP Orion master
HTTP
bidirectional Administrator
IP addresses
All nodes Port used to debug
the cluster manager.
2101 HTTP Oreo HTTP bidirectional Administrator
IP addresses
All nodes Port used to debug
the node daemon.
4001 HTTP Falcon worker
HTTP
bidirectional Administrator
IP addresses
All nodes Port used to debug
the data cache.
4251 HTTP Sage master
HTTP
bidirectional Administrator
IP addresses
All nodes Port used to debug
the search engine.
Required ports for inter-cluster operation
Internally, ThoughtSpot uses static ports for communication between services in
the cluster. Do not close these ports from inter-cluster network communications.
In addition, a number of ports are dynamically assigned to services, which
change between runs. The dynamic ports come from the range of Linux
dynamically allocated ports (20K+).
Table 5: Network ports to open between the nodes in the cluster
Port Protocol Service Name Direction Source Dest. Description
80 TCP nginx inbound All nodes All nodes Primary app HTTP
port (nginx)
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 35
Port Protocol Service Name Direction Source Dest. Description
443 TCP Secure nginx inbound All nodes All nodes Primary app
HTTPS port
(nginx)
2100 RPC Oreo RPC port bidirectional All nodes All nodes Node daemon RPC
2101 HTTP Oreo HTTP port bidirectional Admin IP
addresses
and all nodes
All nodes Node daemon
HTTP
2181 RPC Zookeeper servers
listen on this
port for client
connections
bidirectional All nodes All nodes Zookeeper servers
listen on this
port for client
connections
2200 RPC Orion master RPC
port
bidirectional All nodes All nodes Internal
communication
with the cluster
manager
2201 HTTP Orion master
HTTP port
bidirectional Admin IP
addresses
and all nodes
All nodes Port used to
debug the cluster
manager
2210 RPC Cluster stats
service RPC port
bidirectional All nodes All nodes Internal
communication
with the stats
collector
2211 HTTP Cluster stats
service HTTP port
bidirectional Admin IP
addresses
and all nodes
All nodes Port used to
debug the stats
collector
2230 RPC Callosum stats
collector RPC port
bidirectional All nodes All nodes Internal
communication
with the BI stats
collector
2231 HTTP Callosum stats
collector HTTP
port
bidirectional Admin IP
addresses
and all nodes
All nodes Port used to
debug the BI stats
collector
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 36
Port Protocol Service Name Direction Source Dest. Description
2240 RPC Alert manager bidirectional All nodes All nodes Port where
alerting service
receives alert
events
2888 RPC Ports used by
Zookeeper servers
for communication
between
themselves
bidirectional All nodes All nodes Ports used by
Zookeeper servers
for communication
between
themselves
3888 RPC Ports used by
Zookeeper servers
for communication
between
themselves
bidirectional All nodes All nodes Ports used by
Zookeeper servers
for communication
between
themselves
4000 RPC Falcon worker RPC
port
bidirectional All nodes All nodes Port used by
data cache for
communication
between
themselves
4001 HTTP Falcon worker
HTTP port
bidirectional Admin IP
addresses
and all nodes
All nodes Port used to
debug the data
cache
4021 RPC Sage metadata
service port
(exported by
Tomcat)
bidirectional Admin IP
addresses
and all nodes
All nodes Port where search
service contacts
metadata service
for metadata
4201 HTTP Sage auto
complete server
HTTP interface
port
bidirectional Admin IP
addresses
and all nodes
All nodes Port used to
debug the search
service
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 37
Port Protocol Service Name Direction Source Dest. Description
4231 HTTP Sage index server
HTTP port
bidirectional Admin IP
addresses
and all nodes
All nodes Port used to
debug the search
service
4232 RPC Sage index
server metadata
subscriber port
bidirectional All nodes All nodes Port used
for search
service internal
communication
4233 RPC Sage index server
RPC port
bidirectional All nodes All nodes Port used
for search
service internal
communication
4241 HTTP Sage auto
complete server
HTTP port
bidirectional Admin IP
addresses
and all nodes
All nodes Port used to
debug the search
service
4242 RPC Sage auto
complete server
RPC port
bidirectional All nodes All nodes Port used
for search
service internal
communication
4243 RPC Sage auto
complete server
metadata
subscriber port
bidirectional All nodes All nodes Port used for
search internal
communication
4251 RPC Sage master RPC
port
bidirectional All nodes All nodes Port used
for search
service internal
communication
4405 RPC Diamond
(graphite) port
bidirectional All nodes All nodes Port used for
communication
with monitoring
service
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 38
Port Protocol Service Name Direction Source Dest. Description
4500 RPC Trace vault service
RPC port
bidirectional All nodes All nodes Trace collection
for ThoughtSpot
services
4501 HTTP Trace vault service
HTTP port
bidirectional Admin IP
addresses
and all nodes
All nodes Debug trace
collection
4851 RPC Graphite manager
RPC port
bidirectional All nodes All nodes Communication
with graphite
manager
4852 HTTP Graphite manager
HTTP port
bidirectional Admin IP
addresses
and all nodes
All nodes Debug graphite
manager
4853 RPC Elastic search
stack (ELK)
manager RPC port
bidirectional All nodes All nodes Communication
with log search
service
4853 HTTP Elastic search
stack (ELK)
manager HTTP
port
bidirectional Admin IP
addresses
and all nodes
All nodes Debug log search
service
5432 Postgres Postgres database
server port
bidirectional All nodes All nodes Communication
with Postgres
database
8020 RPC HDFS namenode
server RPC port
bidirectional All nodes All nodes Distributed file
system (DFS)
communication
with clients
8080 HTTP Tomcat bidirectional All nodes All nodes BI engine
communication
with clients
8787 HTTP Periscope (UI)
service HTTP port
bidirectional All nodes All nodes Administration UI
back end
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 39
Port Protocol Service Name Direction Source Dest. Description
8888 HTTP HTTP proxy server
(tinyproxy)
bidirectional All nodes All nodes Reverse SSH
tunnel
11211 Mem-
cached
Memcached server
port
bidirectional All nodes All nodes BI engine cache
12345 ODBC Simba server port bidirectional All nodes All nodes Port used for ETL
(extract, transform,
load)
50070 HTTP HDFS namenode
server HTTP port
bidirectional All nodes All nodes Debug DFS
metadata
50075 HTTP HDFS datanode
server HTTP port
bidirectional All nodes All nodes Debug DFS data
Required ports for inbound and outbound cluster access
ThoughtSpot uses static ports for inbound and outbound access to a cluster.
Table 6: Network ports to open for inbound access
Port Protocol Service Name Direction Source Dest. Description
22 SCP SSH bidirectional ThoughtSpot
Support
All nodes Secure shell access.
80 HTTP HTTP bidirectional ThoughtSpot
Support
All nodes Hypertext Transfer
Protocol for website
traffic.
443 HTTPS HTTPS bidirectional ThoughtSpot
Support
All nodes Secure HTTP.
12345 TCP Simba bidirectional ThoughtSpot
Support
All nodes Port used by ODBC
and JDBC drivers
when connecting to
ThoughtSpot.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 40
Table 7: Network ports to open for outbound access
Port Protocol Service Name Direction Source Destination Description
443 HTTPS HTTPS outbound All nodes 208.83.
110.20
For transferring files to
thoughtspot.egnyte.com
(IP address 208.83.110.20).
25
or
587
SMTP SMTP or
Secure SMTP
outbound All nodes
and SMTP
relay
(provided by
customer)
All nodes Allow outbound access
for the IP address of
whichever email relay
server is in use. This is
for sending alerts to
ThoughtSpot Support.
389
or
636
TCP LDAP or
LDAPS
outbound All nodes
and LDAP
server
(provided by
customer)
All nodes Allow outbound access
for the IP address of the
LDAP server in use.
Required ports for IPMI (Intelligent Platform Management Interface)
ThoughtSpot uses static ports for out-of-band IPMI communications between
the cluster and ThoughtSpot Support.
Table 8: Network ports to open for IPMI access
Port Protocol Service Name Direction Source Dest. Description
80 HTTP HTTP bidirectional ThoughtSpot
Support
All
nodes
Hypertext Transfer
Protocol for website traffic.
About load balancing and proxies
A load balancer is needed in front of a server group in order to direct traffic to
individual servers in a way that maximizes efficiency. Here are some of the best
practices and guidelines for a typical implementation with ThoughtSpot. Your
experience may differ depending on your environment and preference.
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 41
Load balance across ThoughtSpot nodes
The load balancer is an appliance in your infrastructure that routes traffic
automatically to nodes to provide failover. You can also place a load balancer or
proxy in front of the ThoughtSpot appliance if you'd like external network users
to access the system.
The best way to load balance across all ThoughtSpot nodes in a cluster is to map
one domain name (FQDN) to all the IPs in the cluster in a round robin fashion.
For example, if you want to use a DNS server based load balancing, then you can
define multiple "A" resource records (RR) for the same name.
Below is an example of how you could set that up
thoughtspot.customer.com IN A 69.9.64.11
thoughtspot.customer.com IN A 69.9.64.12
thoughtspot.customer.com IN A 69.9.64.13
thoughtspot.customer.com IN A 69.9.64.14
The example indicates that IP addresses for the domain
thoughtspot.customer.com are 69.9.64.11, 69.9.64.12, 69.9.64.13, and 69.9.64.14.
Session Affinity
Session Affinity refers to directing requests to the same application server for
the time it takes to complete a task.
In order for session affinity to work on ThoughtSpot, HTTPS (an SSL certificate)
has to be installed on the load balancer level. If it is installed outside of the load
balancer, session affinity may not occur and the ThoughtSpot system will fail.
Web proxies
You can access ThoughtSpot through any standard web proxy server. Web
proxies are fairly universal regardless of the application they are proxying.
However, ThoughtSpot doesn't use any new protocols, like SPDY or HTTP/2,
Installation and setup
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 42
which may have a dependency on the proxy. Instead, ThoughtSpot is commonly
placed behind a web HTTP/HTTPS proxy.
Additionally, the proxy can round robin across multiple nodes in the
ThoughtSpot backend. You can essentially use the web proxy as a load balancer.
Therefore, your session will carry over if the proxy round robins between the
ThoughtSpot backends as long as the URL doesn’t change.
Network architectural diagram
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 43
Chapter 2: Load and manage data
Load and manage data
Topics:
About case configuration
Generate CSV files with
the data to be loaded
Load data from a web
browser
Append data from a web
browser
Plan the schema
Build the schema
Change the schema
Load data with
ThoughtSpot Loader
Delete a data source
About the Schema Viewer
There are several methods of loading data into
ThoughtSpot. This section describes each method
and why you might choose it above the others.
The fastest and easiest way to load a new table is
by importing it using the Web browser. This is best
for one time data loads of small tables which do
not have complex relationships to other tables. This
method is limited to tables that are under 50 MB
(megabytes) in size.
Using ThoughtSpot Loader, you can script recurring
loads and work with multi-table schemas.
If your data already exists in another database with
the schema you want to use in ThoughtSpot, you can
pull the schema and data in using the ODBC or JDBC
driver.
These are the methods you can use to load data,
along with the benefits of each method:
Table 9: Data loading methods
Method Description Benefits
Load data from a
web browser
Use the ThoughtSpot
Web interface to
upload an Excel
or CSV (comma
separated values)
file from your local
machine.
Easy way to do
a one-time data
load of a small file
(under 50MB).
End users can
upload their own
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 44
Method Description Benefits
data and explore it
quickly.
Use ThoughtSpot
Data Connect. For
details, see the
ThoughtSpot Data
Connect Guide
This is a premium
feature, available at
additional cost. Use
ThoughtSpot Data
Connect to connect
directly to external
data sources and pull
in tables and columns
from them. You can
also set up recurring
loads to keep the
data fresh.
Easy way to
connect to multiple
sources of data
directly and set
up recurring
loads. You won't
need to define a
schema to accept
the data loads,
because this is done
automatically for
you.
Load data with
ThoughtSpot
Loader
Use TQL and tsload
to load data directly
into the back end
database that
ThoughtSpot uses.
Best way to load
large amounts of
data or a schema
with multiple tables.
Can be scripted and
used for recurring
data loads, such
as monthly sales
results or daily logs.
Can be integrated
with an ETL
solution for
automation.
Use the ODBC/
JDBC driver
to connect to
ThoughtSpot
Use the ODBC or
JDBC client with
your ETL tool. For
information, see the
ThoughtSpot Data
Integration Guide.
Make use of an
established ETL
process and tool(s).
Connect to
ThoughtSpot using
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 45
Method Description Benefits
third party tools like
SSIS.
You don't need to
define a schema
to accept the data
load.
Use the
Informatica
Connector
Use the Informatica
Connector if
you already use
Informatica to
connect to your
other data sources.
For information, see
the ThoughtSpot
Data Integration
Guide.
Works with your
established data
migration processes
in Informatica.
If you're uploading data through the Web interface,
you can use a native Excel file. If you want to use a
CSV (comma separated values) or delimited file, or
you are loading using ThoughtSpot Loader, you'll
need to Generate CSV files with the data to be
loaded first.
Note: End users will almost always work with
worksheets and data they upload.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 46
About case configuration
You can set the type of case sensitivity you would like to see reflected in the
ThoughtSpot display.
Before you load your data, you should consider the type of casing you would like
your data to reflect. The case sensitivity for source data strings is preserved in
the display. So, the visual display of results is identical to the input case that is
loaded.
Note: The casing will remain lowercase in other parts of the application,
such as when you ask a question or filter.
It is important to note that string casings aren’t applied globally, but by column.
So datasets will have different string casings as long as they're in different
columns. Tables that are already compacted will keep their lowercase format.
In these cases, in order to get the specific string case that you want, you would
have to truncate related tables and reload them.
To take advantage of case configuration, you need to have ThoughtSpot Support
enable it on your cluster for you. In addition, title casing should be disabled for
string casing to properly work.
Generate CSV files with the data to be loaded
The first step in loading data is to obtain or create one or more flat files that
contain the data to be loaded into ThoughtSpot.
Your data should be in a CSV (comma separated values) or delimited flat file
before you load it. A CSV file is a text file made up of data fields separated by
a delimiter and optionally enclosed with an enclosing character. If your data
contains multiple tables, you'll have a separate CSV for each table.
A CSV file contains:
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 47
A delimiter that marks the separation between fields in the data. The delimiter
is usually comma, but it can be any character.
Fields optionally enclosed with double quotes.
Use these guidelines when creating the CSV file:
Columns in the CSV file must be in the same order as defined in the target
table.
If the CSV contains column headers, they must match the column names in
the database exactly.
Often a | (pipe) or tab is used as the delimiter, because it may be less likely to
occur within the data values.
When a field contains a double quote, it must be escaped with the character
specified in the escape character argument in tsload.
When a field contains the delimiter, the field must be enclosed in double
quotes.
For more information about CSV files and the rules for creating them, check
http://en.wikipedia.org/wiki/Comma-separated_values.
1. If your source is another database:
a) Connect to the source database.
b) Export each of the tables you wish to import into ThoughtSpot as a CSV
file, specifying a delimiter of comma, | (pipe) or tab.
2. If your source is an Excel spreadsheet, save it as a CSV file.
Load data from a web browser
The simplest way to load data is to upload a CSV or Excel file from the
ThoughtSpot Web interface. This method is recommended for small, one time
data loads. Using this method, the data schema is created for you automatically.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 48
Loading data from a Web browser requires your data to be in a CSV (comma
separated values) or a native Excel file.
Any user who belongs to a group that has the privilege Has administration
privileges or Can upload user data will be able to upload their own data from
the browser.
CSV is a common format for transferring data between databases. Your ETL
(extract, transform, load) process will typically generate CSV files. You can also
create a CSV file from a Microsoft Excel spreadsheet by opening the spreadsheet
in Excel, choosing Save As and selecting CSV.
ThoughtSpot supports a wide range of date and timestamp formats in the CSV
file.
Loading data through the Web browser is recommended for smaller tables
(under 50MB) with simple relationships between them. If you are loading a fact
table that joins to dimension tables, you must load the fact table first, and then
the dimension tables. The joining key must be a single column of unique values in
the dimension table. NULL values in the fact table will not be able to be joined.
Blank values in user uploaded CSV files are interpreted as NULL values. These
include the values (case insensitive):
• NULL
• \N
• NA
• N/A
• [space]
To load the CSV or Excel file into ThoughtSpot:
1. Log in to ThoughtSpot from a browser.
2. Click on Data, on the top navigation bar.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 49
Figure 3: Data
3. Click the Actions button in the upper right corner, and select Upload Data.
Figure 4: Upload data
4. Upload the CSV or Excel file by doing one of these options:
Click on Browse your files and select the file.
Drag and drop the file into the drop area.
5. Answer the question Are the column names already defined in the file
header?
6. Answer the question Are the fields separated by? Click Next.
7. Click on the column header names to change them to more useful names, if
you'd like. Click Next.
8. Review the automatically generated data types for each column, and make
any changes you want. There are four data types: Text, Integer, Decimal, and
Date.
9. Click Import.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 50
10.Click Link to Existing Data if you want to link the data you uploaded to the
data in another table or worksheet. Or click Search if you want to begin a new
search.
Append data from a web browser
You can append data to your existing system tables through the ThoughtSpot
application, even if the tables were initially loaded using Data Connect or tsload.
Loading data from a Web browser requires your data to be in a CSV (comma
separated values) or a native Excel file. The file must have the same structure as
the table it is being loaded into, including number and type of columns, in the
same order as the target table.
Any user who belongs to a group that has the privilege Has administration
privileges or Can upload user data will be able to upload their own data from
the browser.
CSV is a common format for transferring data between databases. Your ETL
(extract, transform, load) process will typically generate CSV files. You can also
create a CSV file from a Microsoft Excel spreadsheet by opening the spreadsheet
in Excel, choosing Save As and selecting CSV.
ThoughtSpot supports a wide range of date and timestamp formats in the CSV
file.
Loading data through the Web browser is recommended for smaller tables
(under 50MB) with simple relationships between them. If you are loading a fact
table that joins to dimension tables, you must load the fact table first, and then
the dimension tables. The joining key must be a single column of unique values in
the dimension table. NULL values in the fact table will not be able to be joined.
Blank values in user uploaded CSV files are interpreted as NULL values. These
include the values (case insensitive):
• NULL
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 51
• \N
• NA
• N/A
• [space]
To append data into ThoughtSpot:
1. Log in to ThoughtSpot from a browser.
2. Click on Data, on the top navigation bar.
Figure 5: Data
3. Click the on the table you would like to append data to.
4. Click the Load data button.
Figure 6: Load data
5. Upload the CSV or Excel file by doing one of these options:
Click on Browse your files and select the file.
Drag and drop the file into the drop area.
6. Answer the question Are the column names already defined in the file
header?
7. Answer the question Do you want to append to the existing data or
overwrite it?
8. Answer the question Are the fields separated by? Click Next.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 52
9. Click on the column header names to change them to more useful names, if
you'd like. Click Next.
10.Review the automatically generated data types for each column, and make
any changes you want. There are four data types: Text, Integer, Decimal, and
Date.
11. Click Import.
12.Click Link to Existing Data if you want to link the data you uploaded to the
data in another table or worksheet. Or click Search if you want to begin a new
search.
Plan the schema
Before you can load data with ThoughtSpot Loader, you must create a schema
to receive it, using the SQL command line interface (TQL).
The TQL syntax is similar to the SQL used in other relational databases, but
with some important differences. You'll use DDL (data definition language) to
create the schema into which you'll load the data. You'll probably want to put all
your DDL statements into a text file, which you'll use as a script for creating the
schema.
Before writing your TQL script, you need to understand some basic ThoughtSpot
concepts.
About databases and schemas
ThoughtSpot organizes objects in a hierarchical namespace. Databases contain
schemas, which contain tables.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 53
Figure 7: Namespace diagram
ThoughtSpot can contain one or more databases, and each database can
have multiple schemas. If you do not specify a schema, the default schema
(falcon_default_schema) is used automatically. This makes it easier to add tables
to the database without the need to explicitly create a schema.
If you do create an additional schema, you must refer to its objects using the
syntax <schemaname>.<objectname>. If you do not qualify the schema when
referencing its objects, the default schema (falcon_default_schema) will always
be assumed.
By default, ThoughtSpot creates an internal database to host tables
corresponding to data that is imported by users from a Web browser.
Review the structure of your data
The schema you create to hold the data needs to be a good fit for your data.
First, familiarize yourself with the tables you want to load, and understand their
structure. Make note of this information for each table:
The column names and data types
Type of table (fact or dimension)
Primary key column(s)
The size of the table on disk
Any other tables it can be joined with (foreign keys)
Here's what you'll need to take into account in your TQL for creating each table,
based on these properties:
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 54
Table 10: Table properties that determine schema settings
Table type Table
size
To be joined
with
Schema recommendations
Fact Any Small
dimension
table(s)
• Sharded.
Foreign key references the primary key in the
dimension table.
Fact Any Large
dimension
table(s)
Sharded on the same distribution key as the dimension
table it will be joined with.
Foreign key references the primary key in the
dimension table.
Fact Any Another fact
table
Sharded on the same distribution key as the fact table
it will join with.
Many-to-many relationship defines how the tables will
be joined.
Dimension under
50MB
Fact table(s) Replicated (not sharded).
Has a primary key.
Dimension over
50MB
Fact table(s) Distributed dimension table, sharded on the same
distribution key as the fact table it will be joined with.
Primary key must be the same as the distribution key.
Data types
ThoughtSpot supports the common data types. Compare these with the data
types you want to load, and do any necessary conversion ahead of loading the
data.
Supported data types
The tables you create to receive the data must have the same number of
columns and data types as the data you will be loading. Choose a data type for
each column from the list of supported data types:
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 55
Table 11: Supported data types
Kind of
data
Supported data
types
Details
Character • VARCHAR(n) Specify the maximum number of characters, as in
VARCHAR(255). The size limit is 1GB for VARCHAR values.
Floating
point
• DOUBLE
• FLOAT
DOUBLE is recommended.
Boolean • BOOL Can be true or false.
Integer • INT
• BIGINT
INT holds 32 bits.
BIGINT holds 64 bits.
Date or
time
• DATE
• DATETIME
• TIMESTAMP
• TIME
DATETIME, TIMESTAMP, and TIME are stored at the granularity
of seconds.
TIMESTAMP is identical to DATETIME, but is included for syntax
compatibility.
Note: There is a 1GB limitation on the number of characters for VARCHAR.
If you have any VARCHAR data that exceeds this limit, the entire load will
fail.
Geographical data types
For geographical data types, use VARCHAR. For latitude and longitude, you
can use either VARCHAR or DOUBLE. After loading the data, designate it as
a geographical data type when you Model data in bulk in the modeling file.
Wherever abbreviations or codes are used, they are the same as what the USPS
(United States Postal Service) recognizes.
These types of data can be designated as geographical data, which enables
them to be visualized using the Geo chart types:
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 56
Table 12: Data that uses geo charts
GeoType Description Type: Example
COUNTRY_REGION Countries name: United States
long name: United States
name_sort: United States of America
abbreviation: U.S.A.
adm0_a3: USA
adm0_a3_is: USA
adm0_a3_us: USA
admin: United States of America
brk_a3: USA
brk_name: United States
formal_en: United States of America
iso_a2: US
iso_a3: USA
iso_n3: 840
COUNTY Counties in the
United States
santa clara county
pike county, ohio
pike county, OH
STATE_PROVINCE States in the United
States
name: California
US Postal Service abbreviation: CA
LATITUDE Must be used with
LONGITUDE
• 37.421023
• 1.282911
LONGITUDE Must be used with
LATITUDE
122.142103
• 103.848865
ZIP_CODE Zip codes and zip
codes +4 in the
United States
po_name: MT MEADOWS AREA
ZIP: "00012"
zip2: 12
Other Sub-nation Regions Administrative
regions found in
• bremen
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 57
GeoType Description Type: Example
countries other
than the United
States
• normandy
west midlands
Note: You cannot upload your own custom boundaries.
Constraints
Constraints include primary keys, foreign keys, and relationships. Relationships
allow you to create a generic relationship for use when you want to join tables
that don't have a primary key/foreign key relationship.
Primary keys
When a primary key is selected for a table, it impacts data loading behavior.
When a new row is added:
If another row already exists with same primary key, it is updated with the
values in the new row.
If a row with the same primary key does not exist already, the new row is
inserted into the table.
This behavior is referred to as “upsert” because it does an INSERT or an UPDATE,
depending on whether a row with the same primary key already exists.
Note that ThoughtSpot does not check for primary key violations across
different shards of the table. Therefore, you need to shard the table on the
primary key columns if you require this “upsert” behavior.
Foreign key relationships
Foreign key relationships help ThoughtSpot with default schema modeling
by indicating a connection between two tables. These relationships are used
for joining the tables, and not for referential integrity constraint checking. The
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 58
foreign key relationship is defined on the fact table and references the primary
key(s) in the dimension table.
If you use primary and foreign keys, when users search the data from the search
bar, tables are automatically joined. For example, assume there are two tables:
revenue, which is a fact table
region, which is a dimension table
There is a foreign key on the fact table on regionid which points to the id in the
region dimension table. When a user types in "revenue by region", the two tables
will be joined automatically.
Foreign keys have to match the primary key of the target table they refer to. So if
there are multiple columns that make up the primary key in the target table, the
foreign key must include all of them, and in the same order.
Generic relationships (many-to-many)
You may have a schema where there is a fact table that you want to join with
another fact table. If there isn't a primary key/foreign key relationship between
the tables, you can use many-to-many to enable this. You can do this by using
the RELATIONSHIP syntax to add a link between them, that works similarly to
the WHERE clause in a SQL join clause.
This is a special kind of relationship, that applies to specific data models and use
cases. For example, suppose you have a table that shows wholesale purchases
of fruits, and another table that shows retail fruit sales made, but no inventory
information. In this case, it would be of some use to see the wholesale purchases
that led to sales, but you don't have the data to track a single apple from
wholesale purchase through to sale to a customer.
In a many-to-many relationship, the value(s) in a table can be used to join to
a second table, using an equality condition (required) and one or more range
conditions (optional). These conditions act like the WHERE clause in a SQL JOIN
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 59
clause. They are applied using AND logic, such that all conditions must be met
for a row to be included.
To use a many-to-many relationship, you need to follow a few rules:
There must be one equality condition defined between the two tables.
Each table must be sharded on the same key that will be used for the equality
condition.
There can optionally be one or more range conditions defined.
This example shows the TQL statements that create the two fact tables and the
relationship between them.
TQL> CREATE TABLE "wholesale_buys" (
"order_number" VARCHAR(255),
"date_ordered" DATE,
"expiration_date" DATE,
"supplier" VARCHAR(255),
"fruit" VARCHAR(255),
"quantity" VARCHAR(255),
"unit_price" DOUBLE
) PARTITION BY HASH (96) KEY ("fruit");
TQL> CREATE TABLE "retail_sales" (
"date_sold" DATE,
"location" VARCHAR(255),
"vendor" VARCHAR(255),
"fruit" VARCHAR(255),
"quantity" VARCHAR(255),
"sell_price" DOUBLE
) PARTITION BY HASH (96) KEY ("fruit");
TQL> ALTER TABLE "wholesale_buys" ADD RELATIONSHIP WITH "retail_sales"
AS "wholesale_buys"."fruit" = "retail_sales"."fruit" and
("wholesale_buys"."date_ordered" < "retail_sales"."date_sold" and
"retail_sales"."date_sold" < "wholesale_buys"."expiration_date");
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 60
Note that this many-to-many implementation does not protect from over
counting in some searches. If you plan to use it, make sure your searches don't
include aggregation or count searches that will count one value multiple times,
because it satisfies the join condition for multiple rows.
Sharding
For the best performance, you should split (or shard) very large tables across
nodes. If you have a large dimension table, you might choose to co-shard it with
the fact table it will be joined with.
Sharding a fact table
Use sharding to split large tables into parts for distribution across nodes. This is
typically done with large fact tables, to provide optimal performance.
When sharding, you'll choose a column to be the distribution key. This column
should contain a value that has a good distribution (roughly similar number of
rows with each value in that column). This is typically the primary key, but it can
be any single column or a set of columns.
The recommended number or shards depends upon the number of nodes in your
cluster:
Table 13: Recommended number of shards based on number of nodes
Number of Nodes Number of Shards
3 96
4-12 128
13-24 256
25-36 384
37-48 512
49-60 640
61-72 768
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 61
To specify the number of shards, add PARTITION BY HASH ( ) to your CREATE
TABLE statement, specifying the number of shards and the sharding key. For
example:
TQL> CREATE TABLE ...
...PARTITION BY HASH (96) KEY ("customer_id");
If no sharding is specified or the number of shards specified is one, the table is
assumed to be unsharded (i.e. the table physically exists on each node).
If no sharding key is specified, but the number of shards is greater than one, the
table is assumed to be sharded randomly. The system does not use primary keys
as sharding keys by default.
If you want to use the primary key for sharding, you must specify that the table is
to be partitioned by hash on the primary key, as in this example:
TQL> CREATE TABLE "supplier" (
"s_suppkey" BIGINT,
"s_name" VARCHAR(255),
"s_address" VARCHAR(255),
"s_city" VARCHAR(255),
"s_phone" VARCHAR(255),
CONSTRAINT PRIMARY KEY ("s_suppkey")
) PARTITION BY HASH (96) KEY ("s_suppkey");
Sharded (distributed) dimension tables
In a typical schema, you'd have a sharded fact table with foreign keys to
replicated dimension tables (which exist on every node). This works best where
dimension tables are small (under 50MB). So if your dimension tables are small,
you should shard the fact tables and not shard the dimension tables they will be
joined with.
If you have a large dimension table, you can distribute it the same way as the
fact table it joins to. You might choose to use distributed dimension tables if:
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 62
The dimension table is large (over 50MB).
The tables are always joined using the same columns.
If all of these requirements for the distributed dimension table are met, the
tables are automatically co-sharded for you:
The tables are related by a primary key and foreign key.
The tables are partitioned on the same primary key/foreign key.
The tables have the same number of regions (or shards).
If the dimension table will be joined to multiple fact tables, all of the fact tables
must be sharded in the same way as the dimension table. Self-joins are not
supported.
When a fact and its dimension table are co-sharded:
The two tables will always be joined on the sharding key.
Data skew can develop if a very large proportion of the rows have the same
sharding key.
This example shows the CREATE TABLE statements that meet the criteria for co-
sharding of a fact table and a distributed dimension table:
TQL> CREATE TABLE products_dim (
"id" int,
"prod_name" varchar(30),
"prod_desc" varchar(100),
PRIMARY KEY ("id")
)
PARTITION BY HASH (96) KEY ("id")
;
TQL> CREATE TABLE retail_fact (
"trans_id" int,
"product_id" int,
"amount" double,
FOREIGN KEY ("product_id") REFERENCES products_dim ("id")
)
PARTITION BY HASH (96) KEY ("product_id")
;
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 63
Sharded (distributed) face tables
You can also join two sharded fact tables with different shard keys, otherwise
know as non co-sharded tables. It may take a while to join two tables sharded
on different keys since a lot of data redistribution is required. Therefore,
ThoughtSpot recommends that you use a common shard key for two fact tables.
Chasm traps
In a complex schema, you may have a fact table with no relationship to another
fact table, except that each contains a foreign key to a shared dimension table.
This is known as a chasm trap, and ThoughtSpot can handle it!
A chasm trap in a data schema can introduce problems of over counting if you
join the two fact tables through their shared dimension table. This diagram
shows a typical complex schema with several tables that are related over a
chasm trap:
Figure 8: Complex schema with chasm traps
Examples of use cases where a chasm trap could occur are:
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 64
Attribution analysis comparing campaign data with purchase data, where all
they have in common is that both contain a customer identifier which is a
foreign key to a customer dimension table.
Cost of sales analysis when the wholesale orders data is only related to the
retail sales data through a shared products dimension table.
In many databases, joining tables across a chasm trap creates a Cartesian
product, such that each row from the first fact table is joined to each row from
the second table. This produces over counting when computing counts and
aggregates. But ThoughtSpot protects you from this kind of over counting.
There are still just a few things to look out for when using a schema that contains
chasm traps:
The tables need to be joined to the dimension table via an equi-join (i.e. a
primary key/foreign key relationship). They cannot be joined using a range of
values.
Review the column setting called Attribution Dimension. You may need to
change this setting if some of the columns in the shared dimension table
should not be used for attribution when combining fact tables.
Tables that will be joined across a chasm trap do not need to be co-sharded.
They will be joined appropriately automatically in the most efficient way.
Chasm trap limitations
If your database schema contains any chasm traps, you may encounter these
limitations.
Operations that produce an error for chasm traps
The following limitations on chasm trap schemas will produce a red bar error in
the ThoughtSpot application:
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 65
Show underlying data does not work for chasm trap searches, whether the
search is on a worksheet containing a chasm trap or on base tables that are
related over a chasm trap.
When using the ThoughtSpot APIs, you cannot pass filter values via the URL
if the relevant searches occur on a worksheet containing a chasm trap or on
base tables that are related over a chasm trap.
Legacy Row Level Security does not work with chasm trap schemas. The
newer Rule-Based Row Level Security must be used.
Behavior that is different for chasm traps
The following behavior is different for chasm traps than for schemas that do not
contain a chasm trap:
There are no headlines (single facts based on the data) shown when a search
contains a worksheet containing a chasm trap or base tables that are related
over a chasm trap.
Join information in What am I Looking At? does not appear for searches on
a worksheet containing a chasm trap or on base tables that are related over a
chasm trap.
There are cases when attempting to configure certain charts on chasm trap
worksheets or tables will not work. If this happens, you will see the error Your
search needs to have unique y-axis values for each series of data shown on
the x-axis. The workaround is to remove all columns from the search, except
for those used in your chart.
Workarounds
In some cases, there is a workaround of saving an answer as a worksheet
(Aggregated Worksheet). See the ThoughtSpot User Guide for details on
how to do this. If you save a chasm trap search as a worksheet, it becomes a
materialized view of the answer. Effectively, it is then just a regular table (no
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 66
chasm trap). As such, most of the issues above will not affect searches on the
saved worksheet.
Build the schema
Before you can load data into ThoughtSpot, you must build a database schema
to receive it. You will do this by writing a SQL script, which creates the objects in
your schema.
Your SQL script can use any SQL statements that are supported in ThoughtSpot
SQL Command Line (TQL). The TQL syntax is similar to the SQL used in other
relational databases, but with some important differences. You'll use DDL (data
definition language) to create the schema into which you'll load the data. You'll
probably want to put all your DDL statements into a text file, which you'll use as
a script for creating the schema.
Uploading the SQL script through the browser
You can upload an existing SQL script directly through the browser in the
ThoughtSpot application. You can edit the script or add to it right within the
browser, too.
The steps to build a schema through the browser are:
1. Write a SQL script to create the schema
2. Import a schema (use the SQL editor)
Using TQL to create the schema
You can choose to run your SQL script within the Linux shell, instead of
uploading it through the browser.
The steps to build a schema using TQL include:
1. Connect to the database with the ThoughtSpot SQL Command Line (TQL).
2. Write a SQL script to create the schema.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 67
About TQL, the SQL command line interface
ThoughtSpot provides the ThoughtSpot SQL Command Line (TQL) for creating,
viewing, and managing a schema using SQL. You can run TQL in interactive
command line mode, or you can write a script and use TQL to run it.
TQL basics
The SQL syntax in ThoughtSpot is called TQL for ThoughtSpot SQL. The
ThoughtSpot SQL Command Line (TQL) runs in an interactive mode. To invoke
TQL Log in to the Linux shell using SSH and type tql. At the prompt, type h or
help to see a list of supported commands.
Type your SQL commands on the command line, terminating each command
with a semicolon (;). Commands can span multiple lines. ThoughtSpot supports
a limited number of SQL commands, plus some custom SQL extensions. For
example, you can specify the number of shards and the distribution key as part
of the CREATE TABLE syntax. A full list of supported SQL in TQL is available in
the TQL reference.
You can also Write a SQL script to create the schema in TQL.
Connect to the database with the ThoughtSpot SQL Command Line (TQL)
To perform administrative tasks directly in the database, you will use the
ThoughtSpot SQL Command Line (TQL). TQL supports many, but not all,
common SQL commands.
Before connecting with TQL, you will need:
Access to your ThoughtSpot instance Linux shell from a client machine.
The administrator OS login.
To connect to TQL:
1. Log in to the Linux shell using SSH.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 68
2. Invoke TQL:
$ tql
TQL>
3. Enter your SQL command, followed by a semicolon (;).
Create the schema in TQL
Having examined the structure of the data to be loaded and become familiar
with the ThoughtSpot SQL Command Line (TQL), you are now ready to create
the schema.
This method is a good way to get familiar with TQL and how to create database
objects, but when creating a schema in a production system, you will most likely
Write a SQL script to create the schema.
To create the schema directly in TQL:
1. Connect to the database with the ThoughtSpot SQL Command Line (TQL).
2. If the database you will be using does not exist, create it now:
TQL> CREATE DATABASE my_database;
3. Connect to the database:
TQL> USE my_database;
4. If you wish to use a schema other than the default one, create it now:
TQL> CREATE SCHEMA my_schema;
5. Issue a CREATE TABLE command for each table you will create, using the
information in Plan the schema.
Note: Foreign key declaration within a CREATE TABLE will show the
table created even if there are problems with the foreign key. Therefore,
it is good practice to also issue a separate ALTER TABLE ADD CONSTRAINT
FOREIGN KEY command.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 69
Write a SQL script to create the schema
Using a SQL script to create your schema is a recommended best practice. This
makes it easier to adjust the schema definitions and recreate the schema quickly,
if needed.
The schema creation script is a text file that contains all the SQL commands to
create your schema. Comments should be enclosed in the comment tags /* and
*/.
It is recommended to enclose all object names (schema, table, and column) in
double quotes and any column values in single quotes in your scripts. Object
names that are also reserved words in SQL, or that contain special characters
(any character other than alphanumeric or _ ), must be surrounded by double
quotes. If you see the error message "Error parsing SQL. Check SQL input.", you
should check for object names without double quotes in your script.
If you are working in a schema other than the default schema, object names
must be fully qualified, as in "<schema_name>"."<object_name>".
If your schema includes constraints to define relationships between tables
(foreign key, or the RELATIONSHIP syntax), it is recommended that your script
first creates all the tables, and then at the end, creates the relationships between
them using the ADD CONSTRAINT syntax. This makes it easier to troubleshoot
the script and make changes.
If TQL is run using the flag --allow_unsafe, your statements will always execute
without this warning. Note that when running TQL from a script, you will
need to decide what behavior you want if the script contains changes that
affect dependent objects. If you want the script to run even if objects with
dependencies are afffected, run it using this flag, for example:
cat safest_script_ever.sql | tql --allow_unsafe
1. Open a new file in a text editor.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 70
2. Type in the command to create the database, if it does not already exist:
CREATE database <db_name>;
3. Type in the command to specify the database to use:
USE database <db_name>;
4. Type in the command to create the schema, if you don't want to use the
default schema:
5. Type in each of the CREATE TABLE statements, with its column definitions,
primary key constraints, and sharding specification (if any).
6. At the end of your script, optionally type in the ALTER TABLE statements to
add foreign keys to use in joining the tables.
7. Save the file.
8. Run the script using one of these methods:
Import a schema (use the SQL editor).
Log in to the shell, copy your script to your ThoughtSpot instance using
scp, and pipe it to TQL:
$ cat create_schema.sql | tql
Schema creation examples
These examples demonstrate the steps involved in creating a schema using the
ThoughtSpot SQL Command Line (TQL). After the schema is created, you can
load data into it with ThoughtSpot Loader.
Simple schema creation example
The example creates a database (tpch) with two tables (customer, transaction).
The example does not create a schema explicitly. So it will use the default
schema (falcon_default_schema).
In this example:
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 71
The table customer has a primary key called customer_id. The table
customer_transactions has a primary key called transaction_id.
The customer table is unsharded.
The customer_transactions table is sharded into 96 shards using the
transaction_id column.
Both tables have referential integrity on customer_id.
$tql
TQL> CREATE DATABASE tpch;
TQL> USE tpch;
TQL> CREATE TABLE customer (
name VARCHAR(100),
address VARCHAR(255),
zipcode INT,
customer_id INT,
CONSTRAINT PRIMARY KEY (customer_id)
);
TQL> CREATE TABLE customer_transactions (
transaction_id INT,
customer_id INT,
amount DOUBLE,
transaction_date DATETIME,
CONSTRAINT PRIMARY KEY (transaction_id),
CONSTRAINT FOREIGN KEY (customer_id) REFERENCES
customer(customer_id)
) PARTITION BY HASH (96) KEY (transaction_id);
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 72
More complex schema creation example
The example uses a custom schema called sample_schema to hold the tables.
Because of this, every table reference has to be schema qualified.
$ tql
TQL> CREATE DATABASE "sample_db";
TQL> USE "sample_db";
TQL> CREATE SCHEMA "sample_schema";
TQL> CREATE TABLE "sample_schema"."customer" (
"c_custkey" BIGINT,
"c_name" VARCHAR(255),
"c_address" VARCHAR(255),
"c_city" VARCHAR(255),
"c_nation" VARCHAR(255),
"c_region" VARCHAR(255),
"c_phone" VARCHAR(255),
CONSTRAINT PRIMARY KEY ("c_custkey")
);
TQL> CREATE TABLE "sample_schema"."supplier" (
"s_suppkey" BIGINT,
"s_name" VARCHAR(255),
"s_address" VARCHAR(255),
"s_city" VARCHAR(255),
"s_nation" VARCHAR(255),
"s_region" VARCHAR(255),
"s_phone" VARCHAR(255),
CONSTRAINT PRIMARY KEY ("s_suppkey")
);
TQL> CREATE TABLE "sample_schema"."lineorder" (
"lo_orderkey" BIGINT,
"lo_linenumber" BIGINT,
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 73
"lo_custkey" BIGINT,
"lo_partkey" BIGINT,
"lo_suppkey" BIGINT,
"lo_orderdate" DATE,
"lo_orderpriority" VARCHAR(255),
"lo_shippriority" VARCHAR(255),
"lo_quantify" BIGINT,
"lo_extendprice" BIGINT,
"lo_ordtotalprice" BIGINT,
"lo_discount" BIGINT,
"lo_commitdate" DATE,
CONSTRAINT PRIMARY KEY ("lo_orderkey","lo_linenumber"),
CONSTRAINT FOREIGN KEY ("lo_custkey") REFERENCES
"sample_schema"."customer" ("c_custkey"),
CONSTRAINT FOREIGN KEY ("lo_suppkey") REFERENCES
"sample_schema"."supplier" ("s_suppkey")
) PARTITION BY HASH (96) KEY (lo_orderkey);
Import a schema (use the SQL editor)
You can run a SQL script to create your database schema through the browser,
without having to log in to the shell on the ThoughtSpot instance. You can edit
the script and run it directly in the browser to create the schema.
Importing a schema through the Web browser makes it possible to run your SQL
script without needing to have a Linux login. You can use this capability in any of
these ways:
Create the SQL script ahead of time, and use the browser to run it.
Use the editor to type your SQL directly into the browser.
Use the browser SQL interface as an interactive SQL editor, for example to
test an existing script or make changes to an existing schema.
1. Log in to ThoughtSpot from a browser.
2. Click on Data, on the top navigation bar.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 74
Figure 9: Data
3. Click Actions and select Upload schema.
Figure 10: Upload schema
4. Drag and drop your SQL file into the browser, or choose Browse Your Files to
locate it.
5. You're now in the SQL editor. Use it to view your script and make any changes.
Figure 11: Import schema
6. When ready, run your script by clicking the Execute button.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 75
7. If there are any errors, correct them and run the script again.
Change the schema
After you've created a schema and loaded data, you may find yourself wishing
you'd set things up a little differently. You can make changes to the schema, such
as changing the primary key, relationships to other tables, and sharding.
Making changes to a schema after data has been loaded and users have created
worksheets or pinboards on the tables requires care, so that you don't lose the
relationship between the objects created in ThoughtSpot and the underlying
tables. If you follow the procedures here, your tables will retain their relationships
to the objects created on top of them:
Change the primary key for a table.
Change a relationship between tables.
Change sharding on a table.
Change the primary key for a table
Use this procedure to change the primary key for a table. But use it with caution,
particularly if you are changing to a primary key for which values are not unique.
You should always take a snapshot of your database before making any schema
changes. This will allow you to revert back to the prior state if you make an error,
or something doesn't work as you expected after the schema change.
You can change the primary key of a table without having to TRUNCATE it first
and reload the data. However, changing the primary key could result in data
deletion. This is because of the upsert behavior which is applied when multiple
rows have the same primary key. This is very important to understand ahead of
time, if you are considering changing to a primary key for which values are not
unique.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 76
To change the primary key, first remove any existing primary key, and then define
a new one (if any). You do not need to truncate the tables to do this operation
beginning in version 3.2. Any dependent objects (pinboards or worksheets) will
remain intact.
To change the primary key of a table:
1. Take a snapshot.
2. Connect to the database with the ThoughtSpot SQL Command Line (TQL).
3. Drop the existing primary key (if any), by issuing a command like this example:
TQL> ALTER TABLE "cart"
DROP CONSTRAINT
PRIMARY KEY;
4. Add a new primary key, if desired:
TQL> ALTER TABLE "cart"
ADD CONSTRAINT
PRIMARY KEY ("owner_id");
5. Test that any dependent objects (pinboards, worksheets, etc.) are still working
correctly.
6. Delete the snapshot you created earlier using the command:
tscli snapshot delete <name>
Change a relationship between tables
Use this procedure to remove a relationship between tables or define a new
one. This operation works for both kinds of relationships: foreign key or generic
relationship.
You should always take a snapshot of your database before making any schema
changes. This will allow you to revert back to the prior state if you make an error,
or something doesn't work as you expected after the schema change.
To change a relationship between two tables, first remove any existing
relationship, and then define the new relationship (if any). You do not need to
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 77
truncate the tables to do this operation. Any dependent objects (pinboards or
worksheets) will remain intact.
To change the relationship between tables:
1. Take a snapshot.
2. Connect to the database with the ThoughtSpot SQL Command Line (TQL).
3. Issue the command to drop the existing relationship, by issuing a command
like one of these examples:
Drop a foreign key by name, if it was given a name when it was defined:
TQL> ALTER TABLE
"sales_fact"
DROP CONSTRAINT
"FK_PO_number";
Drop a relationship by name, if it was given a name when it was defined:
TQL> ALTER TABLE "fruit_dim"
DROP RELATIONSHIP "REL_dates";
Drop the foreign key relationship explicitly, if it doesn't have a name, by
referencing the two tables that are joined. This drops all foreign keys
between the two tables:
TQL> ALTER TABLE "shipments"
DROP CONSTRAINT
FOREIGN KEY "orders";
Drop all generic relationships between two tables:
TQL> ALTER TABLE "wholesale_buys"
DROP RELATIONSHIP
WITH "retail_sales";
4. Define a new relationship, if you want to, using ALTER TABLE...ADD
CONSTRAINT...
5. Test that any dependent objects (pinboards, worksheets, etc.) are still working
correctly.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 78
6. Delete the snapshot you created earlier using the command:
tscli snapshot delete <name>
Change sharding on a table
You can change the sharding on a table or remove it altogether (creating a
replicated table) using this procedure. This procedure preserves the data within
the table.
You should always take a snapshot of your database before making any schema
changes. This will allow you to revert back to the prior state if you make an error,
or something doesn't work as you expected after the schema change.
This procedure reshards a table. This is also called redistributing or
repartitioning. You can use this method to reshard a table without losing its data
or metadata. This means that worksheets and pinboards built on top of the table
will continue to work.
You can use these steps to do any of these operations:
shard a table that was previously replicated.
change a replicated table to a sharded table.
change the number of shards to use for a sharded table.
To change the sharding on a table:
1. Take a snapshot.
2. Connect to the database with the ThoughtSpot SQL Command Line (TQL).
3. Issue the command to change the sharding using this syntax:
TQL> ALTER TABLE <table>
[SET DIMENSION | SET FACT
[PARTITION BY HASH
[(<shards>)]
[KEY(<column>)]]]
For example:
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 79
To make a sharded table into a dimension table (replicated on every node),
use:
ALTER TABLE "products"
SET DIMENSION;
To make a dimension table into a sharded (fact) table or change the
number of shards, use:
ALTER TABLE "sales"
SET FACT PARTITION BY HASH (96)
KEY ("productID");
4. Test that any dependent objects (pinboards, worksheets, etc.) are still working
correctly.
5. Delete the snapshot you created earlier using the command:
tscli snapshot delete <name>
About data type conversion
You can convert the data in a column from one data type to another by issuing
a TQL command. There are some details you should be aware of when doing a
data type conversion.
Data type conversion behavior
When converting from one data type to another, any values that can not be
converted will be set to NULL. If errors occur during data type conversion, the
operation is aborted. However, you may choose to force the conversion despite
the errors. You can start TQL in allow_unsafe mode to continue with the data
conversion, at your own risk, of course! To start TQL in unsafe mode, issue this
command:
tql --allow_unsafe
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 80
Multiple columns of a single table can be converted using a single TQL
command. The behavior is transactional. So for example, you would issue a
command like this example:
ALTER TABLE products
MODIFY COLUMN product_id int,
MODIFY COLUMN supplier VARCHAR(4);
Also note that changing data type has implications on the primary key and
sharding enforcement. For example, changing the data type of a column that is
part of the sharding key would lead to a redistribution of data. Then imagine that
the sharding key column contained the text values "00100", "0100", and "100",
which all map to same integer value. If this type of a column is changed from a
VARCHAR to an INT, then it would be subject to the upsert behavior on primary
keys. So in this example, only one of the three rows would be preserved.
Be aware that data type conversion will preserve the data in the underlying
database table, but there is no guarantee that any objects built on top of it
(worksheets or pinboards) will be preserved. This is because you might make
a data type change that makes a chart built on top of the table invalid (for
example a growth chart would be invalidated if the date column it depends on
were changed to a varchar column).
Supported data type conversions
In general, the data type conversions that make logical sense are supported. But
there are a few nuances you should be aware of:
When you convert from INT to BOOL, zero is converted to false, and all non-
zero values are converted to true.
When you convert from BOOL to INT, true gets converted to 1, and false gets
converted to 0.
When you convert from DOUBLE to INT, the value gets rounded.
When you convert from INT to DOUBLE, the value gets rounded.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 81
When you convert from DATETIME to DATE, the date part of value is
preserved and the time part is dropped.
When you convert from DATE to DATETIME, the time gets added as 00:00:00.
The date part of the value is preserved.
When you convert from DATETIME to TIME, the time part of the value is
preserved.
Conversion from TIME to DATETIME is not supported.
Date and time conversions
Some data type conversion require a format string. These include:
conversion from DATE/TIME/DATETIME
conversion to DATE/TIME/DATETIME
For these types of conversions, you'll use a special syntax using parsinghint and
the date format specifications supported in the strptime library function.
For the example, first create a table with a timestamp stored as a VARCHAR:
CREATE TABLE fruit_sales
(time_of_sale VARCHAR(32));
INSERT INTO fruit_sales
VALUES ('2015-12-29 13:52:39');
Now, convert the column from a VARCHAR to DATETIME, using the format %Y-
%m-%d %H:%M:%S:
ALTER TABLE fruit_sales
MODIFY COLUMN time_of_sale DATETIME
[parsinghint="%Y-%m-%d %H:%M:%S"]
Finally, convert the column back to VARCHAR:
ALTER TABLE fruit_sales
MODIFY COLUMN time_of_sale VARCHAR(32);
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 82
Boolean to string conversions
Boolean to string conversions have format strings, too. You'll use parsinghint as
you do for date and time conversions. You can choose among these approaches:
Option 1: Specify string values for both true and false. Any non-matching
values get converted to null. In this example, "100" gets converted to true, and
"0" gets converted to false. "-1" gets converted to null.
ALTER TABLE db
MODIFY COLUMN s bool [parsinghint="100_0"];
Option 2: Specify a string value for true. Any non-matching value gets
converted to false. In this example, "100" gets converted to true, "-1" and "0"
get converted to false.
ALTER TABLE db
MODIFY COLUMN s bool [parsinghint="100_"];
Option 3: Specify a string value for false. Any non-matching value get
converted to true. In this example, "-1" and "100" get converted to true, and
"0" gets converted to false.
ALTER TABLE db
MODIFY COLUMN s bool [parsinghint="_0"];
String to boolean conversions
When converting from a string to a boolean, you must specify a string for true
and false. By default, a string to boolean conversion generates "true" for true,
"false" for false.
ALTER TABLE db
MODIFY COLUMN b varchar(32);
But you may override the default strings that get generated by using parsinghint,
as in this example:
ALTER TABLE db
MODIFY COLUMN b varchar(32) [parsinghint="tr_fa"];
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 83
Change the Data Type of a Column
When you issue the TQL command to convert a column from one data type to
another, the conversion is handled automatically. However, you'll need to ensure
that any visualizations built on top of the table display correctly.
You should always take a snapshot of your database before making any schema
changes. This will allow you to revert back to the prior state if you make an error,
or something doesn't work as you expected after the schema change.
When changing a data type in an existing table, be aware that answers and
pinboards created on top of that table (or worksheets that include it) may
change. This is because charts and aggregations depend upon the data type. So
for example changing from INTEGER to VARCHAR could break charts that used
the numeric data type INTEGER to calculate an average or a total. Because of
this, use caution, and check all dependent objects before and after changing the
data type, to ensure that they display as intended.
To change the data type of a column:
1. Connect to the database with the ThoughtSpot SQL Command Line (TQL).
2. Issue the command to change the data type using this syntax:
TQL> ALTER TABLE <table>
MODIFY COLUMN <column> <new_data_type>;
For example:
ALTER TABLE fact100
MODIFY COLUMN product_id int;
Load data with ThoughtSpot Loader
ThoughtSpot Loader (tsload) is a common way to import data. When using
tsload, you can load larger datasets and make the loading process repeatable
through scripting.
There are several steps to loading data using ThoughtSpot Loader:
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 84
1. Generate CSV files with the data to be loaded.
2. Build the schema.
3. Import CSV files with ThoughtSpot Loader.
Import CSV files with ThoughtSpot Loader
Use ThoughtSpot Loader (tsload) to load data from a CSV text file into
an existing table in ThoughtSpot. tsload accepts flags that enable you to
specify column and row separators, date or timestamp formats, null value
representations, and similar parameters. Many of these options have defaults
that you can override.
Before importing data, you need to Build the schema.
To use ThoughtSpot Loader, type the command tsload followed by the
appropriate flags. You can see the list of the flags it accepts in the ThoughtSpot
Loader flag reference or by issuing tsload -help.
tsload supports both full and incremental data loads. For incremental loads, an
upsert (insert or update) is performed. If an incoming row has the same primary
key as an existing row, it updates the existing row with the new values.
You can integrate tsload into your ETL environment for more automated data
loads. Most ETL tools provide the ability to write target data into files and
support scripted post-transformation actions that can include loading data into
ThoughtSpot. This procedure describes manually loading data, but the tsload
commands could be saved as a script:
1. Log in to the Linux shell using SSH.
2. Change to the directory where your CSV files are staged.
3. Invoke tsload, specifying the appropriate flags and your data source file:
$ tsload --target_database my_database
--target_table my_table --alsologtostderr
--empty_target --source_file my_file.csv --v 1
--field_separator "separator_char"
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 85
4. Repeat the data load for each of your CSV files.
This example imports the CSV file "ssbm_customer.csv" into the
table CUSTOMER:
$ tsload --target_database SAMPLE_DB
--target_table CUSTOMER --alsologtostderr
--empty_target --source_file ssbm_customer.csv
--v 1 --field_separator "|"
Use a script to load data
If you need to load data from multiple CSV files, create a script to automate the
process. You can also use a similar script to automate recurring data feeds.
The data loading script is a text file that contains all the calls to tsload for
loading the data from your CSV files.
The example script shown here uses the cat command to read the data file, and
pipes it to tsload. When creating and testing your script, you may wish to replace
each cat with cat -10, to load only the first ten lines of each file. This allows you
to quickly run a test of your script. When the test succeeds for all the data files,
you can then remove each -10, so the complete files will load when you run the
script again.
1. Log in to the Linux shell using SSH.
2. Navigate to the directory that contains your CSV files and open a new file in a
text editor.
3. Type in the commands to load the data. This example shows commands to
load three files:
cat Players.csv | tsload
--target_database baseball --target_table "players"
--empty_target --field_separator ","
--max_ignored_rows 10 --bad_records_file bad_records.txt
--has_header_row --alsologtostderr --null_value ""
cat AllstarFull.csv | tsload
--target_database baseball --target_table "allstarfull"
--empty_target --field_separator ","
--max_ignored_rows 10 --bad_records_file bad_records.txt
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 86
--has_header_row --alsologtostderr --null_value ""
cat Appearances.csv | tsload
--target_database baseball --target_table "appearances"
--empty_target --field_separator ","
--max_ignored_rows 10 --bad_records_file bad_records.txt
--has_header_row --alsologtostderr --null_value ""
4. Save the file.
5. Run the script:
$ ./load_baseball_data.sh
Bulk load files in parallel
If you have a very large data file that takes a long time to load, you can reduce
the load time by splitting it up into multiple files and loading them in parallel
using multiple invocations of tsload.
If the size of any of your data files is greater than 50 million rows, running tsload
in parallel can reduce the load time significantly. First, you'll split up your large
data file into multiple smaller files. Then create a script to load the files. You will
make your script multi-threaded by invoking multiple loader threads (between 1
and 5 are recommended).
Stage the data files in a location accessible to the node on which you'll run the
script. Usually you'll use an NAS mounted file system. Then run the script to load
the files.
If you want to optimize the load time even further, determine what the
bottleneck is and adjust your process accordingly:
If the disk I/O for reading the data files is the bottleneck, you can stage
the data files on separate NAS mounted file systems and reference them
accordingly in your script.
If the CPU on the machine you're using to run the load script is the bottleneck,
you can split the load script into the same number of parts as you have nodes
in your ThoughtSpot instance, place one script on each node, and run them
in parallel. Make sure the other nodes are able to access the data files where
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 87
they are staged. Note that running the load script on separate nodes will
put the data on all the nodes, just as when you run the script on a single
node. Running the script on all the nodes at the same time just lets you take
advantage of CPU power of each node for hashing data files.
Let's say you have 30 days of data in 30 files, one for the data
collected on each day. Each day's data file contains 10 million rows,
for a total of 300 million rows of data. You want to load the whole
month of data. For this example we'll have 5 loader processes, each
one handling 6 days of data.
Here is a sample script you could use the load the data files in
parallel:
/* Script load_script.sh, loads 30 days of data in parallel */
#!/bin/bash
pidlist=""
cat day1.csv day2.csv day3.csv day4.csv day5.csv day6.csv | tsload
--target_database sales --target_table SALES_FACT --
max_ignored_rows 10
--bad_records_file ./SALES_FACT.bad --date_format %Y-%m-%d
--date_time_format "%Y-%m-%d %H:%M:%S" --source_data_format
delimited
--field_separator "|" --null_value "" --enclosing_character "\""
--boolean_representation 1_0 &
pidlist="$pidlist $!" &
cat day7.csv day8.csv day9.csv day10.csv day11.csv day12.csv |
tsload
--target_database sales --target_table SALES_FACT --
max_ignored_rows 10
--bad_records_file ./SALES_FACT.bad --date_format %Y-%m-%d
--date_time_format "%Y-%m-%d %H:%M:%S" --source_data_format
delimited
--field_separator "|" --null_value "" --enclosing_character "\""
--boolean_representation 1_0 &
pidlist="$pidlist $!" &
cat day13.csv day14.csv day15.csv day16.csv day17.csv day18.csv |
tsload
--target_database sales --target_table SALES_FACT --
max_ignored_rows 10
--bad_records_file ./SALES_FACT.bad --date_format %Y-%m-%d
--date_time_format "%Y-%m-%d %H:%M:%S" --source_data_format
delimited
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 88
--field_separator "|" --null_value "" --enclosing_character "\""
--boolean_representation 1_0 &
pidlist="$pidlist $!" &
cat day19.csv day20.csv day21.csv day22.csv day23.csv day24.csv |
tsload
--target_database sales --target_table SALES_FACT --
max_ignored_rows 10
--bad_records_file ./SALES_FACT.bad --date_format %Y-%m-%d
--date_time_format "%Y-%m-%d %H:%M:%S" --source_data_format
delimited
--field_separator "|" --null_value "" --enclosing_character "\""
--boolean_representation 1_0 &
pidlist="$pidlist $!" &
cat day25.csv day26.csv day27.csv day28.csv day29.csv day30.csv |
tsload
--target_database sales --target_table SALES_FACT --
max_ignored_rows 10
--bad_records_file ./SALES_FACT.bad --date_format %Y-%m-%d
--date_time_format "%Y-%m-%d %H:%M:%S" --source_data_format
delimited
--field_separator "|" --null_value "" --enclosing_character "\""
--boolean_representation 1_0 &
pidlist="$pidlist $!" &
wait $pidlist
Call your script using a command like:
nohup bash ./load_script.sh > master_log.txt &
tail -f master_log.txt
Constructing your script in this way will execute all the commands
in the background, and output to the file master_log.txt. You'll
see a running status as the commands in the script execute.
After the script completes, you can check the log file for detailed
information, such as the number of rows that loaded successfully.
Delete a data source
When you want to delete a data source, you first need to handle any dependent
objects that have been built on top of it. You can easily see these dependencies,
and choose how to handle them before deleting the data source.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 89
There are two separate ways to delete a data source. Both of these methods will
check for dependencies and warn if any are found:
Delete or change a table in TQL describes the dependency checking that
occurs when deleting or changing a table using TQL.
Delete a data source from the browser describes the dependency checking
that occurs when deleting a data source through the ThoughtSpot application.
You can also Check dependencies in the browser before attempting to delete a
data source.
Delete a data source from the browser
You can delete data sources from the browser, as long as they were not created
by an administrator through tsload or Data Connect.
You can delete data sources from the browser if they were created from the
browser. These types of data sources include:
Data imported from the browser.
• Worksheets.
ThoughtSpot checks for dependencies whenever you try to delete a table or
worksheet.
1. Click on Data, on the top navigation bar.
Figure 12: Data
2. Check the box next to the name of the data source you want to delete.
3. Click the delete icon.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 90
Figure 13: Delete a data source
4. If you attempt to delete a data source with dependent objects, the operation
will be blocked. You will see a list of dependent objects with links.
Figure 14: Warning message when trying to delete an object with
dependencies
5. Click on a dependent object to modify or delete it.
If you want to remove the dependency by modifying the dependent object,
you'll need to remove all search terms or columns that refer back to the data
source you are trying to delete.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 91
6. When all dependencies have been removed, you will be able to go back and
delete the data source.
Check dependencies in the browser
You can see all of the dependencies for any data source (worksheet or table) on
the Manage Data page.
To view dependent objects for a data source:
1. Click on Data, on the top navigation bar.
Figure 15: Data
2. Click the name of the data source whose dependencies you want to view.
3. Click Dependents.
You will see a list of the names of the dependent objects (worksheets and
pinboards), and the columns they use from that data source. You can use this
information to determine the impact of changing the structure of the data
source or to see how widely it is used.
Figure 16: List of dependent objects
4. Click on a dependent object to modify or delete it.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 92
If you want to remove the dependency by modifying the dependent object,
you'll need to remove all search terms or columns that refer back to the data
source you are trying to delete.
5. When all dependencies have been removed, you will be able to go back and
delete the data source.
Delete or change a table in TQL
Just as attempting to delete an object in the web browser warns of any
dependencies, making changes using ThoughtSpot SQL Command Line (TQL)
that modify or delete tables warns of dependencies.
When you enter a TQL statement, you will be warned of possible dependency
consequences with a prompt asking if you’d like to proceed. This should make
you feel safe issuing TQL commands, even commands like dropping a table.
If TQL is run using the flag --allow_unsafe, your statements will always execute
without this warning. Note that when running TQL from a script, you will
need to decide what behavior you want if the script contains changes that
affect dependent objects. If you want the script to run even if objects with
dependencies are afffected, run it using this flag, for example:
cat safest_script_ever.sql | tql --allow_unsafe
If you do not run the script using the flag, it will fail if any of its commands might
cause problems with dependent objects.
TQL actions with possible dependency consequences include:
Change, add, or remove a primary key.
When changing or adding a primary key, if the key in question is not unique
in the data it may cause deletion of rows, because of upserts occurring
when duplicate primary keys are found.
When changing or removing a primary key, incoming foreign key
relationships will be broken.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 93
Change a column datatype.
Add a relationship or foreign key.
Drop a relationship or foreign key constraint.
Change or add a sharding key.
Drop a table, schema, or database.
When issuing one of the above commands, you will see a warning message
similar to this:
TQL> ALTER TABLE table1
DROP CONSTRAINT PRIMARY KEY;
WARNING: This operation will break the Foreign Key relationship “products”
with table “sales”, which will break 34 user-visible visualizations and
2 Worksheets. We recommend taking a snapshot before performing this operation.
Do you wish to proceed? (yes/no).
About the Schema Viewer
There is a schema viewer in ThoughtSpot which lets you see your database
schema in the web browser. You can see tables and worksheets and their
relationships. The Schema Viewer is interactive, so you can configure it to show
just what you want to see.
Bringing up the Schema Viewer
You can access the Schema Viewer from the Manage Data screen by clicking
Actions, and selecting View Schema.
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 94
Figure 17: Access the Schema Viewer
When viewing the schema, you can filter the tables shown similarly to how you
filter data sources. The list of tables, worksheets, and imported data on the
left includes only those objects you want to see. Clicking on one of the objects
brings it to the middle of the viewer and highlights it. You can drag the objects
around in the viewer.
Figure 18: Schema Viewer filters
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 95
How to use the Schema Viewer
You can use the Schema Viewer to find out information like:
What is the relationship between two tables?
What tables make up this worksheet, and how are they joined?
The schema viewer shows joins between tables, join directionality, and join type
(whether they are Foreign Key > Primary Key, relationship joins, or joins defined
by users through the web interface). Use the Table list to find a specific table or
worksheet.
Worksheet view
For worksheets, you can also click on one to view the worksheet. The worksheet
view shows the following information:
All tables in the worksheet, and the relationships between these tables.
Source columns for all columns of a worksheet.
Keys and definitions for each relationship, as well as join paths and types.
Columns that are derived from formulas.
Correct join paths for newly created chasm trap worksheets. Existing chasm
trap worksheets will not show the correct join paths.
Figure 19: Worksheet view example
Load and manage data
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 96
The worksheet view does not work for aggregated worksheets, but does works
for worksheets built on top of aggregated worksheets.
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 97
Chapter 3: Model, link, and tag your data for
searching
Model, link, and tag your data for searching
Topics:
Model the data for
searching
Link tables using
relationships
About stickers
Modeling, tagging, and adding links between your
data sources can make the data even easier to
search.
You can start searching your data without doing
any modeling, creating relationships, or tagging. But
putting some thought into these will make the data
even easier and more intuitive to search for your end
users.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 98
Model the data for searching
After loading data, you will create the semantic data model that makes the data
easy to search. You can make these settings either through the Web interface or
by using the model file.
About data modeling
Data modeling is a very lightweight process compared to what you may have
experienced in other tools. It enables you to change some of the settings (or
properties) of the data so that it becomes more searchable.
You can change these settings in two ways, both of which have the same effect:
1. Model data in the Web interface.
2. Model data in bulk in the model file.
Choose the model file method if you want to make many changes in one bulk
operation.
Modeling the data allows you to give the columns more search friendly names or
predefine how they can be explored and aggregated. When you load data, most
of this data modeling metadata is set up for you automatically. However, since
you know your data best, you can adjust the modeling settings to improve the
experience for your users.
Model data in the ThoughtSpot application
To make modeling settings for a table you've just loaded, or to make a quick
change to existing settings, use the ThoughtSpot web interface. You can adjust
the Columns settings from the data management listing.
You can change all the same data model settings here as in the model file. This
method is easier and faster, unless you need to make many settings in bulk. In
that case, using the model file is recommended.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 99
Click on the Data icon, to get to the data management listing. Then click on the
name of your data source.
Figure 20: Select a data source
This brings up the Columns screen, where you'll make your modeling settings.
Modify the column settings by clicking on them, typing or selecting the setting
you want, and then saving your changes. Descriptions of the different settings
are listed in Data modeling settings.
Figure 21: Edit modeling settings in the Columns screen
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 100
Model data in bulk in the modeling file
Properties of your data such as Column Names, Column Visibility, Column and
Data Definition, Column Rank, etc. are defined in the modeling file. Use the
model file when you want to edit these settings in bulk.
If you just want to make one or two quick changes, it will be faster to Model data
in the ThoughtSpot application instead.
Data modeling is a three steps process:
1. Download the model file.
2. Change settings in the model file.
3. Upload the edited model file.
You can edit the data modeling file using Microsoft Excel, vi/vim, or a compatible
tool. In each row of the modeling file, all the data properties corresponding to a
column from your data are listed. You can modify many of these properties by
typing in the new value.
Remember these important guidelines when editing the model file:
Do not modify any value in a column which contains DoNotModify in the field
under the column heading.
Make sure to keep the file in the same format as it had when you downloaded
it (CSV text file).
Download the model file
Before you can make changes to the model file, you need to download it. Then
you can edit it using Microsoft Excel, vi/vim, or a similar text editing tool.
First, you'll download the model file, and then make changes to the appropriate
Data modeling settings. To obtain the model file:
1. Log in to ThoughtSpot from a browser as an Administrator user.
2. Click on the Admin icon, on the top navigation bar.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 101
Figure 22: The Admin icon
3. Click on Business Data Model.
4. Click Download model.xls.
Figure 23: Download the model file
Change settings in the model file
After downloading the model file, you'll make changes to the settings using this
procedure. Then you will upload the file again to apply your changes.
You can edit any of the values in the model file, except for those where the
words DoNotModify appear below the column header. To make changes in the
model file:
1. Open the model file you downloaded (model.xls) in Excel, vi/vim, or a text
editor.
If you are using Excel, you may see a warning message. Click Yes to
proceed.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 102
Figure 24: Warning when opening the model file
If your model file includes multi-byte characters, edit the file using vi or
vim. This is because model files containing multi-byte characters must be
saved as UTF-8 encoded. Otherwise you won't be able to upload them after
making your edits.
2. Find the column you want to modify. Descriptions of the meanings of the
columns are listed in Data modeling settings.
3. Select the value you want to change.
4. Type in the new value.
5. After making all your changes, save the model file.
If you are using Excel, you will see a message. Click Yes to save the file.
Figure 25: Warning when saving the model file
If your model file includes multi-byte characters, edit the file using vi or
vim. This is because model files containing multi-byte characters must be
saved as UTF-8 encoded. Otherwise you won't be able to upload them after
making your edits.
Upload the edited model file
After you have made changes to the modeling file, you must upload it back to
ThoughtSpot before the changes will take effect.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 103
Save the model file in the same format as it was when you downloaded it. If you
are using Microsoft Excel to edit the file, you will see a warning when attempting
to save it. Click Yes and save the file.
Figure 26: Warning message when saving the model file
To upload the model file:
1. Log in to ThoughtSpot from a browser as an Administrator user.
2. Click on the Admin icon, on the top navigation bar.
Figure 27: The Admin icon
3. Click on Business Data Model.
4. Click BROWSE YOUR FILES to upload the model.xls file, or drag and drop it in
the zone.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 104
Figure 28: Upload the model file
If you receive an error message upon uploading the file, check that it does not
include any multi-byte characters (i.e. Japanese or other multi-byte language
characters). If it does, you'll need to download the file again and make your
edits using vi or vim.
Note that you may choose to remove all the rows you have not changed from
the model file before uploading it. If you upload a model file that includes only
the changed rows, you won't lose any of the pre-existing model file settings.
This is a good option if your model file is causing an error on upload, but you
aren't sure where in the model file the problem is.
As soon as the file is uploaded, ThoughtSpot performs any necessary re-
indexing for you automatically. Your new settings will be reflected within a few
minutes.
Data modeling settings
Whether you are changing data modeling settings using the modeling file or the
Web interface, the settings and their accepted values are the same.
Modeling settings
You can change these settings in two ways, both of which have the same effect:
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 105
1. Model data in the Web interface.
2. Model data in bulk in the model file.
Choose the model file method if you want to make many changes in one bulk
operation.
This index lists the editable data modeling settings:
Table 14: Modeling settings
Setting name Description Can be modeled in a
Worksheet
Column Name Sets the name of the column to
be used in searches.
Yes
Description Adds a text description of
what the column contains.
Yes
Data Type Read only. Shows the column's
data type.
Column Type Sets the type of column, either
ATTRIBUTE or MEASURE.
Additive Controls the type of
aggregations that will be
available for a column.
Aggregation Sets the default aggregation
type for MEASURE columns.
Hidden Sets the visibility of a column.
Synonyms Adds synonyms that can be
used in the search bar to refer
to a column.
Yes
Index Type Sets the type of index that will
be created for a column.
Geo Type Enables a column to be used in
GeoMap visualizations.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 106
Setting name Description Can be modeled in a
Worksheet
Priority Changes the priority of a
column in search suggestions.
Number Format Specifies the format to use
when showing a numeric value
in the column.
Date Format Specifies the format to use
when showing the dates in a
column.
Currency Format Specifies the format to use
when showing the currencies in
a column.
Attribution Dimension Only applies to tables that join
over a Chasm Trap. Designates
whether the tables depend on
this column for attribution.
Data modeling for worksheets
For worksheets, only some of the settings can be modified, whether you
are using the modeling file or the Web interface. The editable settings for
worksheets are:
• Name
• Description
• Synonyms
If you want to change any of the settings that cannot be modified in a
worksheet, you need to make your changes to the underlying table instead, and
they will be reflected in all worksheets that use the table.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 107
Change the column name
You can change the text that is shown for the column names in ThoughtSpot to
make the names more meaningful to users. The column name is what they will
type to add that column to their search.
Column Name is the name that will be displayed to users for that column in
ThoughtSpot. The default is the name you gave the column when you defined
the table in the database or imported the CSV file from the browser.
1. Find the Column Name for the column whose name you want to change.
2. Type in the new column name, as you want it to appear in ThoughtSpot.
3. Save your changes.
Add a column description
You can provide a description for a specific column, to provide additional
information for users about the data it contains.
Description contains an optional description for the corresponding column.
When a user hovers over the column, a tooltip will show this description.
To create a column description:
1. Find the column you want to add a description for, and select its Description.
2. Type in the column description, as you want it to appear in ThoughtSpot.
3. Repeat for all columns where you want to add a description.
4. Save your changes.
Change the column type (ATTRIBUTE or MEASURE)
Columns have a Column Type based on the kind of data they store. This is set
automatically upon defining the table, but in some cases, you may want to
change the type.
There are two types of columns:
ATTRIBUTE contains a property, like name, address, or id number.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 108
MEASURE contains a numeric value that can be compared in a meaningful
way using math, such as a count or measurement.
When a new table is created, the default column type is set according to the
data type defined for each column. By default, columns with the numeric data
types (FLOAT, DOUBLE, INT, or BIGINT) are assigned the type MEASURE.
Columns with VARCHAR, BOOL, or date/time data types are assigned the type
ATTRIBUTE.
Usually the default setting for column type works fine. But occasionally you'll
need to change a MEASURE to an ATTRIBUTE. Examples of numeric values for
which mathematical operations are not meaningful include:
ID numbers
Key values that are primarily used for joining tables
Product number or SKU
Sports team member jersey number
Year, when separate from a date (e.g. 1999, 2000)
You can change the column type by modifying the Column Type setting.
1. Find the Column Type for the column whose type you want to change.
If you are editing the model file, use the setting Type.
2. Change it to either MEASURE or ATTRIBUTE.
3. Save your changes.
Change the additive setting for a column
Your data may contain a column with a numeric data type that you have
defined as an ATTRIBUTE rather than a MEASURE (such as age). You can allow
aggregations on the values by changing the value of the Additive setting.
The setting Additive only applies to columns that have both:
A numeric data type (FLOAT, DOUBLE or INTEGER) or a date data type
(DATE, DATETIME, TIMESTAMP, or TIME).
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 109
The type ATTRIBUTE in the modeling file.
When an result is returned from a search, you also get a list of options for each
column on the left side of the screen. For numeric columns with the Additive
setting of NO, the count listing includes:
UNIQUE COUNT OF
TOTAL COUNT OF
Figure 29: Default options with Additive set to “NO”
If you want aggregates to be added to the options for these columns, you
need to change the Additive setting to YES. After you make this change, these
options will be offered:
TOTAL OF
AVG OF
STD DEVIATION OF
VARIANCE OF
TOTAL COUNT OF
UNIQUE COUNT OF
MIN OF
MAX OF
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 110
Figure 30: Options for numeric columns with type ATTRIBUTE and Additive set
to YES
To change the Additive setting:
1. Find the column whose additive setting you want to change and select its
Additive value.
2. Change the value to one of these:
YES or NO, if using the Web interface.
TRUE or FALSE, if using the model file.
3. Save your changes.
Change the aggregation setting for a column
All types of aggregations can be performed on MEASURE columns, and some
aggregations can be done on ATTRIBUTE columns. You can change the default
aggregation type to make combining data more intuitive and faster.
To aggregate a column without having to enter the aggregation type explicitly
in your searches every time, you can set a default Aggregation for that column.
Note that any non-numeric columns (columns of type ATTRIBUTE) will have a
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 111
default aggregate type of NONE, which you can change to one of the supported
aggregation types.
Table 15: Supported aggregate types
Aggregate type Description
NONE Does no aggregation. This is the default for ATTRIBUTE
type columns.
SUM Adds the values together and returns the total. This is the
default for MEASURE type columns.
AVERAGE Calculates the average of all the values.
MIN Calculates the minimum value.
MAX Calculates the maximum value.
STD_DEVIATION Calculates the standard deviation of all the values.
VARIANCE Calculates the variance of all the values.
COUNT Calculates the total number of values.
COUNT_DISTINCT Calculates the total number of distinct values.
1. Find the column whose default aggregation type you want to change, and
select its Aggregation.
If using the modeling file, use the AggregationType setting.
2. Select the new default aggregation type.
3. Save your changes.
Supposed there is a table containing data about athletes on a
sports team. The data contains some numerical values, including
points scored, salaries, and jersey numbers for each of the players.
Because jersey number is an INTEGER, it would become a column
of type MEASURE (not ATTRIBUTE). So it will aggregate, by
default. But you may want to make its aggregation type NONE
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 112
instead. This ensures that search results that include jersey number
will not attempt to compare or aggregate those values in a way
that is not meaningful.
Hide a column
You can hide columns from users in ThoughtSpot without dropping them from
the database. This is common when the data contains identifier columns that are
used to join tables, but which do not have any meaning to users.
By default, all columns in a data source will be shown in ThoughtSpot. To hide
the column names, change the Hidden setting.
As the number of columns in the dataset increases, the search experience
requires more effort. Users have to navigate through larger numbers of columns
to choose the correct one. There might also be some columns in the dataset that
you don’t want to expose to the users. To hide these columns, you can set the
value of the Hidden column to YES.
1. Find the Hidden setting for the column you want to hide, and set its value to
YES.
If you are editing the model file, use the Hide setting, and set it to TRUE.
2. Save your changes.
Create synonyms for a column
If you want to allow searching using more than one name for a column, you can
create synonyms for it. This is helpful when different departments refer to the
data using different terminology, for example.
When users search a data source, they might try typing different words to try
to retrieve a particular column. This could be due to different groups in your
organization using different terms for the same data. Or maybe your users just
intuitively use different words when searching for that item. Using synonyms
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 113
allows them to access the data even if the term they choose isn't the same as the
actual column name.
You can set column synonyms for columns in tables, user imported data, and
worksheets. Note that the returned table or chart uses the actual column name,
but the search bar will still reflect the term the user typed in (the synonym).
To create a column description:
1. Find the column for which you want to add synonyms, and select its
Synonyms.
2. Type in a comma separated list of the synonyms you want to add. If a
synonym is more than one word, it must be enclosed in double quotes.
If you are using the Web interface, you would type:
profit,"gross profit"
If you are using the model file, the list of synonyms must be enclosed in
square brackets. For example:
[profit, "gross profit"]
3. Save your changes.
Change the index type for a column
ThoughtSpot indexes column names and unique column values. The indexes
are used to dynamically generate suggestions in the search bar when typing a
search.
You can change the way a column is indexed by modifying its Index value in the
modeling file, to influence the suggestions that will appear for that column. The
default behavior of indexing is as follows:
All column names are indexed using their ColumnName value.
Values for columns with the column type of MEASURE are not indexed.
Values for columns with the data type of DATE are not indexed.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 114
Columns that contain a large amount of free-form text (i.e. the number of
characters in more than a few of the fields is more than 50) are indexed as
PREFIX_ONLY by default.
Short strings (like a firstname column) are indexed using
PREFIX_AND_SUBSTRING by default, which indexes both prefix and
substrings.
It is not recommended to change the indexing for columns with very large free
text values. These should not to be indexed, because indexing on these values is
not useful and may generate confusing suggestions.
You can override the default behavior by editing the modeling file to change the
Index value for any columns that should be indexed differently. There are several
different supported index types:
Table 16: Supported index types
Index type Description
DEFAULT This is the default value. The default indexing
behavior will apply to the column values,
depending on their type. PREFIX_AND
SUBSTRING for short values and PREFIX_ONLY
for long values and free-form text.
DONT_INDEX Prevents indexing on the column values.
PREFIX_AND_SUBSTRING Allows full indexing such that prefix and sub-
string search both work for the column values.
PREFIX_ONLY Allows indexing such that only prefix search
works for the column values.
PREFIX_AND_WORD_SUBSTRING Allows indexing such that only prefix search
works for each word of a multi-word string, for
the column values.
1. Find the column whose index type you want to modify, and set its Index Type.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 115
If you are using the model file, double click in the Index cell, and type in the
index type you want to use.
2. Save your changes.
Consider a column in which there are four values “ThoughtSpot”,
“Thought”, “Spot” and “Thought Spot”. If you search for “sp”,
depending on the setting for indexing, the column value search
result suggestions will vary:
Table 17: Example of the effects of indexing on search bar
suggestions
Index field value Search bar suggestions
DEFAULT “ThoughtSpot”, “Spot” and “Thought
Spot”
DONT_INDEX No suggestions.
PREFIX_AND_SUBSTRING “ThoughtSpot”, “Spot” and “Thought
Spot”
PREFIX_ONLY “Spot”
PREFIX_AND_WORD_SUBSTRING “Spot” and “Thought Spot”
Add a geographical data setting for a column
Certain attribute columns that contain location data can be used to create
GeoMaps. ThoughtSpot supports Latitude, Longitude, Zip Code, US States, US
Counties, Countries, and select international sub-nation regions.
You can designate a column as "Geo" by editing the GeoType column in the
modeling file or the Columns setting screen.
Columns that can be designated as "Geo" columns need to contain text
(VARCHAR) data unless they contain latitude/longitude data. Latitude and
longitude columns can contain numeric data (DOUBLE) or text.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 116
If you are using a column with the data type DOUBLE for latitude and longitude,
you will also need to change the following settings for those columns:
Set Column Type to ATTRIBUTE.
Set Additive to NO.
Set Aggregation Type to NONE.
1. Find the GeoType for the column that contains the geographical data.
2. Change the value to the appropriate GeoType, depending on the kind of data
the column contains.
Table 18: Data that uses geo charts
GeoType Description Type: Example
COUNTRY_REGION Countries name: United States
long name: United States
name_sort: United States of America
abbreviation: U.S.A.
adm0_a3: USA
adm0_a3_is: USA
adm0_a3_us: USA
admin: United States of America
brk_a3: USA
brk_name: United States
formal_en: United States of America
iso_a2: US
iso_a3: USA
iso_n3: 840
COUNTY Counties in the
United States
santa clara county
pike county, ohio
pike county, OH
STATE_PROVINCE States in the
United States
name: California
US Postal Service abbreviation: CA
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 117
GeoType Description Type: Example
LATITUDE Must be used with
LONGITUDE
• 37.421023
• 1.282911
LONGITUDE Must be used with
LATITUDE
122.142103
• 103.848865
ZIP_CODE Zip codes and zip
codes +4 in the
United States
po_name: MT MEADOWS AREA
ZIP: "00012"
zip2: 12
Other Sub-nation Regions Administrative
regions found in
countries other
than the United
States
• bremen
• normandy
west midlands
3. If your data includes latitude and/or longitude columns that are stored as a
numeric data type (DOUBLE), make these changes for those columns:
a) Change the Type or ColumnType to ATTRIBUTE.
b) Change Additive to NO/FALSE.
4. Save your changes.
Set the search suggestions priority for a column
You can change the priority that determines which columns are shown in search
suggestions and the order in which they appear.
The value of Priority determines the priority order (rank) in which a particular
column (and its values) appears in the search dropdown. You can push a column
up in the order (increase the rank) by increasing its Priority value. A higher value
(like 2) will cause the corresponding column and its values to appear higher up
in the search dropdown than columns with lower value (like 1). By default, the
Priority value is set to “1” for all columns.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 118
You should only use numbers between 1-100 in this field. It is recommended to
assign values in increments of 5 or 10, to allow room to assign values to other
columns that may be inserted later on.
1. Change the value to a number between 1 and 100. Use a higher value if you
want to boost the search priority, and a lower value if you want to lower it.
2. Save your changes.
Set the format to use when showing numbers
You can set a format for how numbers are displayed in tables and charts. For
example, you can display numbers with a different number of digits after the
decimal point, based on the data modeling setting Format Pattern.
You can use any of the supported number formats for delimiters and number
of digits to show using Java Decimal Notation. Currency symbols are not
supported.
The default values are:
#,### for integer data types (INT, BIGINT).
#,###.00 for decimal data types (DOUBLE and FLOAT).
These are some examples of formats you can use:
Table 19: Number format examples
Stored Value Format Pattern Display Value
12345.6789 #,##0.## 12,345.68
12345.6789 #,##0.### 12,345.68
12345.6789 #,##0.00000 12,345.68
12345.6789 #,##0 12,345
12345.6789 #,##0.00 12,345.68
12345 #,##0.## 12,345
12345 #,##0.00 12,345.00
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 119
To change the date format used to display a column's values:
1. Find the Format Pattern for the column whose display format you want to
change.
2. Change it to the format you want to use.
3. Save your changes.
Locale-based number formatting
Number formatting is set by default based on your browser locale setting. This
has been set in order to accommodate users in various geographical locations,
primarily in the US and Europe regions.
For example, if you are using ThoughtSpot in the US, the number formatting
should look like this: xxx,xxx.xx. And in Europe, it should look like this: xxx.xxx,xx.
Set the format to use when showing dates
You can set a format for how dates are displayed in tables and charts. For
example, you can display dates in a standard European or US format based on
the data modeling setting Format Pattern.
You can use any of the supported date formats listed in the Date and time
formats reference. These are some examples of formats you can use:
• MM/dd/yyyy
MMM (for abbreviated month format)
• DD/mm/yyyy
MM/dd/yyyy HH:mm
DD/mm/yyyy HH:mm
To change the date format used to display a column's values:
1. Find the Format Pattern for the column whose display format you want to
change.
2. Change it to the format you want to use.
3. Save your changes.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 120
Set the format to use when showing currencies
You can set a format for how currencies are displayed in tables and charts when
using the ThoughtSpot Data API or Embedding. For example, you can display
currencies in a standard European Euro or US Dollar format based on the data
modeling setting Currency Type.
When you specify the currency type of your data on the Manage Data page,
your currency data will only display the correct format and currency code in
the embedded use case. Currency specific symbols are available in the non-
embedded use case as well, but they are not localized. All users are treated as
if they are in en-US locale unless they are in embed mode and their browser
configuration tells ThoughtSpot that they are in some other locale. For example,
100 Polish Zloty appears as 100zł to a user in Poland, but without localization
enabled, it appears as PLN 100.
This subtle difference can be seen when you use the REST API. See the
ThoughtSpot Application Integration Guide for more information on the API.
To change the currency format used to display a column's values:
1. Find the Currency Type for the column whose display format you want to
change.
2. Click on it to open the Specify Currency Type menu.
Figure 31: Specify Currency Type
3. Select one of the following ways you would like to change the format.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 121
Infer From Browser
Your currency data will be modeled upon the locale of your browser
setting.
From a column
Your currency data will be modeled upon the existing currency information
in the selected column.
Note: This option is disabled if there is no VARCHAR column to
choose from.
Specify ISO Code
Your currency data will be modeled upon your selection from the available
currency code choices.
4. Click Ok to save your changes.
Change the Attribution Dimension setting of a column
The Attribution Dimension setting applies only to tables that are related through
a chasm trap. If your schema does not include these, you can ignore this setting.
The Attribution Dimension setting only applies to tables that join over a Chasm
Trap. By default, the attribution dimension setting will be set to Yes, but you can
override that by setting the column's attribution dimension property to No, as
described here.
In the classic chasm trap two fact tables are related through a shared dimension
table. When the two fact tables are joined, the shared column(s) in the
dimension table are used to attribute rows in one fact table to match with rows
in the other fact table. Usually, all goes well using this method. But sometimes
an incorrect or illogical attribution can result, because the column chosen is not
meaningful for performing this attribution. If you are seeing unexpected results
in searches that include tables across a chasm trap, this setting is for you.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 122
Here is an example of a column that is not an attribution dimension. Suppose you
have two fact tables, Wholesale Purchases and Retail Sales, that share a common
dimension Date. In this case, the date column in the Date dimension should not
be used for attribution, since unrelated rows in both of the fact tables could
share the same row in the Date table. If I bought oranges wholesale on April
25, 2005 and made a retail sale of apples on the same day, there is no logical
relationship between those two events. Combining the two events using the date
they share will not create any meaningful information.
Figure 32: Example where a column is not an attribution dimension
If matching rows in two fact tables over a chasm trap depends on the values
in a column contained in a dimension table, that column is known as an
attribution dimension. In this example, the Product ID column in the Products
dimension table is an attribution dimension. For rows where the Product ID in
the Wholesale Purchases and in the Retail Sales tables is a match, those rows
are logically related in a meaningful way. They can be combined in charts and
reports to produce a logical, expected outcome.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 123
Figure 33: Attribution dimension example
To designate a column as not being an attribution dimension (i.e. not producing
any meaningful attribution across a chasm trap):
1. Find the column that is not an attribution dimension and select its Attribution
Dimension.
2. Set the value to No.
If you're using the modeling file, set it to FALSE.
3. Save your changes.
Link tables using relationships
You can link tables by creating relationships between their columns. Linked
tables can be searched together or combined into a worksheet for easy
searching. Tables that have no relationship between their columns can not be
combined in a single search.
There are two ways to create relationships between tables:
1. Create a constraint using TQL.
2. Create a relationship through the web interface.
The two methods create the same kind of relationship both from an end user
perspective and an administrative perspective.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 124
Both types of relationships exist within the database. You can also generate a
script through TQL that contains all relationships, whether create via the web
interface or in TQL.
Relationships created through either method can be managed either via
TQL or by going to the Relationships page when viewing data in the Date
Modeling section in the ThoughtSpot application. You can view, modify, or delete
relationships in either place.
You may create relationships using a mixture of TQL and the web interface, but
the relationships you create cannot form a circular relationship, or "cycle". If
you attempt to create a relationship that would complete a cycle, you will see
a message saying that the relationship could not be added because it conflicts
with another existing relationship.
Create a relationship
You can quickly create a relationship (or link) between tables that allows you to
combine them in a single search. Choose a column to join on that both tables
contain (e.g. employee ID or product key).
You must have either administration privilege or modify access permission to the
columns to create a relationship.
When creating a link between the columns in two data sources, the columns
being linked must have the same data type, with the same meaning. That is, they
must represent the same data. Normally, you'll make this kind of link from a fact
table column to a column in a dimension table that uniquely identifies a logical
entity in your data such as Employee ID for a person, Product ID for a product, or
Date Key for a specific date in a date lookup table.
To create a relationship through the Web interface:
1. Click on Data, on the top navigation bar.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 125
Figure 34: Data
2. Click on the name of the data source you want to link from.
3. Select Relationships.
Figure 35: Select Relationships
4. If there are already some existing relationships, scroll down and click Add
Relationship. Otherwise, continue to the next step.
5. Click on Source Column and select the column you want to link in the source
table.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 126
Figure 36: Select a Source Column
6. Under Destination Table find and select the table that you want to link to.
Figure 37: Find and select a Destination Table
7. Click on Destination Column and select the column you want to link to in the
destination table.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 127
Figure 38: Select the Destination Column
8. Click Add Key to add the link.
9. Name your relationship and optionally give it a description.
Figure 39: Name the Relationship
10.Click Add Relationship.
11. Repeat these steps for creating a link until all the links you want to make for
your table have been created.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 128
Delete a relationship
If you created a relationship (link) between tables using the Web interface, you
can also delete it from the Web interface. But if the relationship was created
using TQL, you must also use TQL to delete it.
You must have either administration privilege or modify access permission to the
columns to delete a relationship.
To delete a relationship using TQL, use an ALTER TABLE...DROP FOREIGN KEY...
statement.
To delete a relationship from the Web interface:
1. Click on the Data icon on the top navigation bar and then on Worksheets.
Figure 40: Go to the worksheet list
2. Click on the name of the data source you from which you want to remove the
relationship.
3. Select Relationships.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 129
Figure 41: Select Relationships
4. Find the relationship you want to delete, and click Delete.
Figure 42: Delete a relationship
About stickers
You can create stickers to make it easier for people to find data sources and
pinboards.
About stickers
Stickers enable you to create categories for classification of objects, including
pinboards, answers, data sources, and worksheets. Only administrators can
create stickers, and they are global in scope. This means that everyone can see
the stickers and use them to tag objects. They can also filter lists of objects by
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 130
sticker. Stickers are often used to designate subject areas, such as sales, HR, and
finance, but you can use them any way you like.
This is the workflow for using stickers:
1. Only administrators can create stickers.
2. Anyone can Apply a sticker.
3. Anyone can Filter by a sticker.
Create stickers
You can create stickers for use in tagging pinboards, worksheets, and data
sources.
Only administrator users can create stickers. Anyone can apply the stickers you
create, or use them as filters when selecting from a list of sources or pinboards.
To create a sticker:
1. Navigate to the Manage Data or Pinboards screen using the icons in the top
navigation bar.
2. Choose Select sticker, scroll to the bottom of the list, and click + Add.
Figure 43: Add a sticker
3. Type the name for the new sticker.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 131
4. You can change the name or color of a sticker by clicking the edit icon next to
its name.
Figure 44: Edit a sticker
Apply a sticker
Apply a sticker whenever you want to tag a data source, worksheet, or pinboard
to make it easier to find.
Only administrators create stickers, but anyone with edit privileges can tag an
object with a sticker.
To tag an object with a sticker:
1. From the top menu, choose Answers, Pinboards, or Data.
Figure 45: Choose Answers, Pinboards, or Data
2. Find the item(s) you want to tag in the list, and check the box next to its
name.
3. Click the apply sticker icon and choose one from the list.
You can apply as many stickers as you like to an object.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 132
Figure 46: Choose a sticker to apply
Filter by a sticker
Whenever you are selecting objects from a list, you can filter by sticker to find
what you're looking for.
Anyone can use stickers to filter lists of pinboards or data sources. You can also
filter by sticker when selecting data sources.
To filter by sticker:
1. From the top menu, choose Answers, Pinboards, or Data.
Figure 47: Choose Answers, Pinboards, or Data
2. Click on Select sticker, and select a sticker to filter by. Click on its name.
Model, link, and tag your data for searching
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 133
Figure 48: Filter by a sticker
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 134
Chapter 4: Simplify searching with worksheets
Simplify searching with worksheets
Topics:
Create a new worksheet
Edit a worksheet
Delete a worksheet or
table
After modeling the data, create worksheets to make
searching easier. A worksheet groups multiple related
tables together in a logical way.
Worksheets are flat tables created by joining columns
from a set of one or more tables or imported
datasets. You might use a worksheet for these
reasons:
To pre-join multiple tables together.
To give a user or group access to only part of the
underlying data.
To include a derived column using a formula.
To rename columns to make the data easier to
search.
To build in a specific filter or aggregation.
Users are often unfamiliar with tables and how they
are related to one another. For example, a sales
executive might need to search for information about
retail sales. The required data could be contained
in several tables (sales, customers, products, stores,
etc.), with foreign key relationships between them.
An administrator who is familiar with the data model
can create a retail sales worksheet, that combines
all of the related fact and dimension tables into a
single, easy-to-use view, and share it with the sales
executive. This provides access to the data without
requiring an understanding of how it is structured.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 135
You will typically create one worksheet for each set
of fact and dimension tables. For example, you may
have a sales fact table and an inventory fact table.
Each of these fact tables shares common dimensions
like date, region, and store. In this scenario, you
would create two worksheets: sales and inventory.
The following diagram depicts the workflow for
creating the sales worksheet.
Figure 49: Workflow for creating a sales worksheet
The process for creating a worksheet is:
1. Decide which tables to use for the worksheet.
2. Create a new worksheet.
3. Add sources (tables) to the worksheet.
4. Choose the inclusion rule to apply.
5. Choose the worksheet join rule.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 136
6. Select the columns to include.
7. Create formulas, if needed.
8. Save the worksheet.
9. Share the worksheet with groups or users.
An alternative way to create a worksheet is to
do a search and save it as a worksheet. See the
ThoughtSpot User Guide for details on how to do
this.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 137
Create a new worksheet
Create a worksheet to make the data easy for users to search. This process
includes adding a new worksheet, after which you will choose the data sources
to include in it.
To create a new worksheet:
1. Click on Data, on the top navigation bar.
Figure 50: Data
2. Click the Actions icon from the upper right side of the screen, and select
Create worksheet.
Figure 51: The Create worksheet icon
Add sources and columns to a worksheet
After creating a worksheet, you need to add the sources that contain the data.
Sources is another name for tables. The sources you choose are typically related
to one another by foreign keys.
To add the sources to the worksheet:
1. Click on the Choose Sources link.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 138
Figure 52: The Choose Sources link
2. Check the box next to each of the sources you want to include in the
worksheet.
Note that the list of sources only shows the tables on which you have view
privileges.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 139
Figure 53: Choose sources for a worksheet
3. If you want to see what the data inside the sources looks like, click Explore all
data.
4. Choose the inclusion rule.
5. Choose the worksheet join rule.
6. Click Done to save your changes.
7. Expand the table names under Columns to select the columns to add to the
worksheet.
a) To add all of the columns from a table, click on the table name and click +
Add Columns.
b) To add a single column, double click on its name.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 140
c) To add multiple columns, Ctl+click on each column you want to add and
click + Add Columns.
Note that once you add a column, non-related tables (i.e. those without a
primary/foreign key relationship) become hidden. If you are working with two
tables that should be related, but are not, you can add a relationship between
them.
8. Click on the worksheet title to name it, and then Save it.
9. Click on each column name to give it a more user-friendly name for searching.
You can tab through the list of columns to rename them quickly.
10.If you want to add a prefix to the name of several columns, select them, click
the Add Prefix button, and type in the prefix.
Figure 54: Add a prefix to column names
11. Click Actions and select Save.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 141
Figure 55: Save a worksheet
12.Share your worksheet, if you want other people to be able to use it.
How the inclusion rule works
Use the inclusion rule to specify which data to include in a worksheet where two
or more tables are joined. If you are familiar with SQL, you might think of it as a
JOIN condition.
In the case where some of the rows in the fact table do not have a match in
one of the dimension tables, the inclusion rule determines whether or not they
will be shown. Because of this, the inclusion rule can affect the number of rows
the worksheet will contain. Setting the inclusion rule differently will affect the
number of rows in the worksheet if some of the values in a table are empty (or
NULL) or if some primary key column values in a fact table do not have a match
to a foreign key in the dimension table.
Only rows in the fact table (also known as the LEFT table) are affected by the
inclusion rule. It works like this:
If you choose Apply full outer join (Full Outer Join), the results of both the
left and right outer joins are combined, and all matched or unmatched rows
from the tables on both sides are shown.
If you choose Apply left outer join (Left Outer Join), all possible rows in
the fact table are shown, regardless of whether they have a match in the
dimension tables.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 142
If you choose Apply right outer join (Right Outer Join), all possible rows in
the second table are shown, regardless of whether they have a match in the
dimension tables.
If you choose Exclude empty rows (Inner Join), any rows that do not have a
match in one of the dimension tables, won't be shown in search results.
When using Exclude empty rows (Inner Join), the number of rows in the
resulting worksheet can differ from the number of rows in the table when
accessing it directly, because of the join condition. The worksheet acts like a
materialized view. This means that it contains the results of a defined query in
the form of a table.
If you find that the charts and tables built on a worksheet contain a large number
of null values (which display as {blank} in the web browser), you can fix this by
changing the inclusion rule for the worksheet.
The answer returned when searching using a worksheet as the source can be
different from the answer you get when using the table directly as a source.
When using a worksheet as a source, even if you were to select fields that come
from only one table in your search, any underlying joins to other tables will still
be active. When using the table directly as the source, you will see every value.
This is best understood through an example.
A typical sales fact table contains a column with the employee ID
of the person who made the sale. The employee ID column has a
foreign key in the employee dimension table. This is the relationship
used to join the two tables.
Sometimes a sale has been made directly or through a reseller,
without involving a sales person. In this case, the employee ID
value for the sale will be empty in the fact table. If you wanted the
worksheet to include all sales, regardless of whether or not they
were associated with a sales person, you would choose Include all
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 143
rows (Left Outer Join). If you only want the worksheet to contain
sales made by employees, you would choose Exclude empty rows
(Inner Join).
How the worksheet join rule works
Use the worksheet join rule to specify when to apply joins when a search is done
on a worksheet. You can either apply joins progressively, as each search term is
added (recommended), or apply all joins to every search.
Often, a worksheet includes several dimension tables and a fact table. With
progressive joins, if your search only includes terms from the fact table, you'll
see all of the rows that satisfy your search. But as you add terms from dimension
tables, the total number of rows shown may be reduced, as the joins to each
dimension table are applied.
It works like this:
If you choose Apply joins progressively (recommended for most cases), joins
are only applied for tables whose columns are included in the search.
If you choose Apply all joins, all possible joins are applied, regardless of which
tables are included in the search.
When using Apply joins progressively, the number of rows in a search using the
worksheet depends on which tables are part of the search. The worksheet acts
like a materialized view. This means that it contains the results of a defined query
in the form of a table. So if a particular dimension table is left out of the search,
its joins are not applied.
About the worksheet join rule with Rule-Based Row Level Security
When working with worksheets and row level security, you need to understand
how joins are applied. This is especially important if your schema includes any
chasm traps.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 144
This section applies only to the newer Rule-Based Row Level Security. If you are
using the older, Legacy Row Level Security (not recommended), see About the
worksheet join rule with Legacy Row Level Security.
Rule-Based Row Level Security with worksheets
In the past, if you used the Legacy Row Level Security, you could depend
on the worksheet join rule to protect sensitive data, based on the row level
security settings on a single table. But now, with Rule-Based Row Level Security,
you need to protect every table that contains any sensitive data. To do this,
you'll grant access by creating explicit row level security rules on each of the
underlying tables which contain data that row level security should apply to.
When creating the row level security rules for a table that's part of a worksheet,
you aren't limited to referencing only the columns in that table. You can specify
columns from other tables in the worksheet as well, as long as the tables are
joined to the table you're creating the rule on. Then, when creating a worksheet
on top of them, the behavior is consistent regardless of the worksheet join rule
you choose. Users will never be able to see data they should not, regardless of
what their search contains.
Example of using Rule-Based Row Level Security to secure a table
Imagine you have a worksheet that contains a "Sales" fact
table, and "Customer" and "Product"dimensions that are joined
on "Customer SSN" and "Product Code" columns. In order to
secure the "Sales" table, you can use "Customer Name" from the
"Customer" column to create a row level secuirty rule.
Chasm Trap
This is particularly important with chasm trap schemas. For chasm trap schemas,
if row level security is only set on one of the tables, people could see data they
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 145
should not see if the scope of their search does not include that table. (this
protects the from having people see the wrong things if they have chasm trap).
For any worksheets that include a chasm trap, you need to use the new Rule-
Based Row Level Security. In fact, starting in release 3.3, if you have existing
Legacy Row Level Security built on a chasm trap schema, you'll need to migrate
to the new row level security before you can use them anymore. If you were still
using Legacy Row Level Security, after upgrading to 3.3.x, you would not be able
to access any of those worksheets. You’d see a message advising you to migrate
to the newer Rule-Based Row Level Security.
Note also that for chasm trap worksheets, progressive and non-progressive joins
do not apply. There is an entirely different methodology for how worksheet joins
on a chasm trap schema work with row level security. So you can safely ignore
that setting.
About the worksheet join rule with Legacy Row Level Security
When working with worksheets and Legacy Row Level Security, you need to
understand how joins are applied. This section gives some examples to explain
the interaction between these two concepts.
This section applies only to the older Legacy Row Level Security. If you are
using the newer, Rule-Based Row Level Security (recommended), see About the
worksheet join rule with Rule-Based Row Level Security. If your schema includes
any chasm traps, you must use the newer Rule-Based Row Level Security.
Worksheet schemas and the root table
To understand how the worksheet join rule is applied, you first need to
understand worksheet schemas and the concept of root tables. When you create
a worksheet, you're effectively creating a self-contained schema made up of
the tables in the worksheet and the relationships (joins) between the tables.
The joins (represented by arrows in the diagram) reflect the primary key/foreign
key relationships between the tables in the underlying database schema. The
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 146
concept of the "root" table in the worksheet schema becomes important for
understanding how the joins are applied when searching. In this context, the root
table is specific to the schema structure defined for that worksheet.
Suppose you created a worksheet with a schema like the example in the
diagram:
Figure 56: Example of a worksheet data schema
Imagine your schema as a tree, with a root, branches, and leaves. In this diagram,
the root table is the fact table Sales. The root table is typically a fact table. It
is the table that does not have a foreign key pointing to it. So if you draw out
the schema like the diagram, the tables at the bottom can be referred to as
the "leaves" in the schema. If the worksheet only included the tables Products,
Departments, and Suppliers, then the root table would be the Products table.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 147
Similarly, if the worksheet only included Inventory and Warehouses, the root
table would be Inventory.
Apply all joins
When you choose Apply all joins when creating a worksheet, all joins between
the tables get pre-applied, whether or not there is row level security present.
This is the simplest case.
Apply joins progressively without row level security
What if you choose Apply joins progressively when creating a worksheet, and
none of the tables in ThoughtSpot have row level security applied? In this case,
the joins will be applied using the worksheet schema as in these examples:
Example 1: Progressive join with tables from one branch
Joins are applied from the root table of the worksheet down to the
lowest leaf table involved in the search. If the worksheet includes
all of the tables in the diagram, but when doing a search we choose
only columns from Products and Departments, the joins get applied
starting at the root table and moving down to all of the tables
included in the search. That is, joins from Sales to Products to
Departments (Join 1 and Join 2) will be used.
Example 2: Progressive join with tables from different branches
Joins are applied from the root table of the worksheet, moving
down each branch, to the lowest leaf tables involved in the search.
If we searched on columns from only the Suppliers and Warehouses
tables, the join path would traverse down the tree to reach the
lowest leaf table in each branch. That is, the joins applied would be:
Join 1
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 148
Join 3
Join 4
Join 5
Apply joins progressively with row level security
If any of the tables in ThoughtSpot have level security applied to them, the joins
used in your worksheet will be affected like this:
If the row level security is applied only outside the scope of the worksheet
schema, the join behavior is the same as when there is no row level security in
the system.
If the worksheet contains even a single table with row level security, non-
progressive joins (Apply all joins) will be used if the join path includes the
table with row level security. Remember that the join path begins at the root
table and moves down to each of the leaf tables included in the search. So a
table with row level security may occur in the join path even if its columns are
not included in the search.
Example 3: Progressive joins when a table outside of the join path
has row level security
Assume the table Departments has row level security applied,
so that department managers can only see the department they
manage. If we did a search on the tables Suppliers and Products,
progressive joins would be used. The join path would be Join 1, Join
3. The Department table row level security would not apply, since
the Departments table is not in the join path.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 149
Example 4: Progressive joins when a table in the join path has row
level security
Assume now that the table Products has row level security applied,
so that buyers could only see the products they order. If we did a
search on the tables Departments and Warehouses, Apply all joins
would be used, so the Products row level security would apply.
The join path would be Join 1, Join 2, Join 4, and Join 5. This join
path takes us through the Products table, which explains why its
row level security would affect the search results, even though no
columns from the Products table are included in the search.
Chasm trap
If you have a worksheet that includes a chasm trap, you cannot use the Legacy
Row Level Security. You must migrate your row level security settings to use
Rule-Based Row Level Security.
About aggregated worksheet and table joins
You have the ability to join an aggregated worksheet with a table.
Previously you could only join aggregated worksheets with other aggregated
worksheets. Now, you can join an aggregated worksheet with a system table by
creating a relationship. This means aggregated worksheets now behave similar
to tables, and they can be used in the same way as a table, excluding any TQL
manipulation.
You also have the capability to create a worksheet on top of an aggregated
worksheet. So aggregated worksheets can be included as tables in regular
worksheets.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 150
About formulas in worksheets
You can define formulas and use them to create derived columns in worksheets.
You create formulas by combining standard functions and operators, column
names, and constant values.
Anyone who can create a worksheet can add a formula to it. Formulas are not
reusable; the formula you create is associated only with the worksheet it belongs
to.
A complete list of available formulas and examples of each is available in the
Formula reference.
Create a formula in a worksheet
You can create a formula in a worksheet by using the Formula Builder. When you
do this, the result of the formula gets added to the worksheet as a column.
Use these steps to create a formula:
1. Create a new worksheet, or edit an existing one.
2. Click the + button next to Formulas.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 151
Figure 57: Create a new formula in a worksheet
3. Type your formula in the Formula Builder.
Figure 58: Use the Formula Builder
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 152
Note: Formulas elements are color coded by type and can include the
formula operators and functions (blue), the names of columns (purple),
and/or constants (black).
4. If you want to change what your formula returns, use the Advanced settings.
Depending on your formula, you may be able to change:
Data type
ATTRIBUTE or MEASURE
Aggregation type
Figure 59: Advanced settings in the Formula Builder
5. You can see a list of formula operators with examples by clicking on Formula
Assistant.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 153
Figure 60: Examples in the Formula Assistant
6. Name the formula by clicking on its title and typing the new name. Click Save.
Edit a worksheet
As long as you have permissions to edit a worksheet, you can always go into
it and make changes, such as adding sources and columns, adding or editing
formulas, and changing column names.
To edit a worksheet:
1. Click on the Data icon on the top navigation bar and then on Worksheets.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 154
Figure 61: Go to the worksheet list
2. Click on the name of the worksheet you want to edit from the list.
3. Click the Edit button in the upper right hand side of the screen.
4. Make your changes to the worksheet.
5. Click Actions and select Save.
Figure 62: Save a worksheet
Rename a worksheet or table
You can change a worksheet or table name from the ThoughtSpot application.
To change the name of a worksheet or table:
1. Click on Data, on the top navigation bar.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 155
Figure 63: Data
2. Find the worksheet or table you want to rename and click on its name.
3. On the right hand side, click the current name, and enter a new name.
Figure 64: Enter a new name
You can also edit column names and other details in the same way.
4. Click Done and Save.
Change the inclusion or join rule for a worksheet
As long as you have permissions to edit a worksheet, you can always go into it
and set a different inclusion rule or join rule.
If you find that the charts and tables built on a worksheet contain a large number
of null values (which display as {blank} in the web browser), you can fix this by
changing the inclusion rule for the worksheet.
To change the inclusion or join rule of a worksheet:
1. Click on the Data icon on the top navigation bar and then on Worksheets.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 156
Figure 65: Go to the worksheet list
2. Click on the name of the worksheet you want to edit from the list.
3. Click the Edit button in the upper right hand side of the screen.
4. Click on the Choose Sources link.
Figure 66: The Choose Sources link
5. Choose the inclusion rule and/or the worksheet join rule.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 157
Figure 67: The worksheet join rule and inclusion rule
6. Click Done.
7. Click Actions and select Save.
Figure 68: Save a worksheet
Delete a worksheet or table
When you try to delete a worksheet or table, you'll see a message listing any
dependent objects that must be removed first.
ThoughtSpot checks for dependencies whenever you try to remove a table or
worksheet. A list of dependent objects is shown, and you can click on them to
delete them or remove the dependency. Then you'll be able to remove the table
or worksheet.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 158
To delete a worksheet or table:
1. Click on Data, on the top navigation bar.
Figure 69: Data
2. Find the worksheet or table you want to remove in the list, and check the box
next to its name.
3. Click the Delete icon.
Figure 70: The Delete icon
4. If you are attempting to delete a data source with dependent objects, the
operation will be blocked. You will see a warning, with a list of dependent
objects with links. Click on the link for an object to modify or delete it. When
all its dependencies are removed, you will be able to delete the data source.
Simplify searching with worksheets
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 159
Figure 71: Dependent objects warning
5. You can also click on the name of a worksheet or table and then click
Dependents, to see a list of dependent objects with links.
The Dependents list shows the names of the dependent objects (worksheets
and pinboards), and the columns they use from that source. You can use this
information to determine the impact of changing the structure of the data
source or to see how widely used it is. Click on a dependent object to modify
or delete it.
Figure 72: Dependent objects message
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 160
Chapter 5: Manage users, groups, and privileges
Manage users, groups, and privileges
Topics:
About privileges
Add a group and set
security privileges
Edit or delete a group
Add a user
Add multiple users to a
group
Edit or delete a user
Forgotten password
Before people can log in and use ThoughtSpot,
you need to create a username, a password, and
a membership in one or more groups for them.
Creating groups and assigning users to them makes
privilege management easier.
Ways of managing users and groups
This section describes manual creation of users,
groups, and privileges, but you can also manage
users through LDAP or SAML. For information on
setting up SAML authentication, see the ThoughtSpot
Application Integration Guide.
The "All" group
There is a default group called All, which includes
every user in ThoughtSpot. When you create a
new user, they will be added to the All group
automatically. You cannot delete the All group or
remove members from it.
Privileges
Privileges determine what kinds of actions users are
allowed to do. Plan your groups so that you can use
them to assign a common set of privileges to multiple
users. Privileges are set at the group level. For more
information on the privileges you can assign and how
to assign them, see About privileges.
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 161
Nested groups (groups within groups)
You can also have a hierarchy of groups. That is,
groups can belong to (i.e. be children of) other
groups. When using group hierarchies, permissions
are inherited from the parent group. So if you're a
member of a sub-group, you would automatically
have the privileges of the parent group.
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 162
About privileges
You can assign privileges at the group level. Then you create users and assign
them to groups. This is how you grant users access to different capabilities in
ThoughtSpot.
Each group includes a set of privileges for its users. Good planning when
creating groups and assigning privileges will pay off in ease of administration
and a better search experience. The privileges a group has determine the actions
that its members are allowed to do. If a user belongs to more than one group,
they will have the highest level of the privileges from all the groups they belong
to.
Here are the different privileges, and the capabilities they enable:
Table 20: Group Permissions
Privilege Description
Has administration privileges Can manage Users and Groups and has view
and edit access to all data.
Can upload user data Can upload their own data from the browser
using Import Data.
Can download data Can download data from search results and
pinboards.
Can share with all users Can see the names of and share with users
outside of the groups the user belongs to.
Can manage data Can create a worksheet. Can also create an
aggregated worksheet from the results of a
search by selecting Save as worksheet. Can
also use ThoughtSpot Data Connect, if it is
enabled on your cluster.
Can schedule pinboards Can create pinboard schedules and edit their
own scheduled jobs.
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 163
Privileges are additive, meaning that if a user belongs to more than one group,
they will have the highest level of privileges from among the groups they are a
member of. They are also inherited from the parent, so that a sub-group gets all
the same privileges of its parent, all the way up the group hierarchy.
If you add the privilege Has administration privileges to a group, note
that all users in that group will be able to see all the data in ThoughtSpot.
Administrators can see all data sources, and Row level security does not apply to
them.
There is a default group called All, which includes every user in ThoughtSpot.
When you create a new user, they will be added to the All group automatically.
You cannot delete the All group or remove members from it. If you have a
common set of privileges you want every user to have (typically Can upload user
data and/or Can download data), add those privileges to the All group.
Permissions to see and edit tables, worksheets, and pinboards are set when you
share them with users and groups, as described in the topic Data security.
Add a group and set security privileges
Before adding users, create the groups they will belong to. Each group includes
a set of privileges for its users. Good planning when creating groups and
assigning privileges will pay off in ease of administration and a better search
experience.
To create a group and add privileges for the group:
1. Log in to ThoughtSpot from a browser.
2. Click on the Admin icon, on the top navigation bar.
Figure 73: The Admin icon
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 164
3. In the Admin panel, click on User Management and Groups.
Figure 74: Manage Groups
4. Click the + Add Group button on the upper right hand side of the list of
groups.
Figure 75: Add a new Group
5. Enter the details for the new group:
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 165
Figure 76: Enter Group details
a) Enter a unique name for the group.
b) Optionally enter a description.
c) Check the privileges you want to grant to the group.
If you add the privilege Has administration privileges to a group, note
that all users in that group will be able to see all the data in ThoughtSpot.
Administrators can see all data sources, and Row level security does not
apply to them.
d) Click the Manage Groups tab if you want to add sub-groups. Find the
groups you want to add in the list, or search for them by name. Check the
box next to each group you want to add to the group.
e) Click the Manage Users tab if you want to add users. Find the users you
want to add in the list, or search for them by name. Check the box next to
each user you want to add to the group.
6. Click Create to create the group.
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 166
Edit or delete a group
After adding a group, you can always go in and change its settings to add or
revoke privileges. The new settings will apply to all the group members.
To edit or delete an existing group:
1. Log in to ThoughtSpot from a browser.
2. Click on the Admin icon, on the top navigation bar.
Figure 77: The Admin icon
3. In the Admin panel, click on User Management and Groups.
Figure 78: Manage Groups
4. Find the group you want to edit in the list and click its name, or the edit icon
. If you don't see the name of the group, try searching for it.
You can also delete a group from this page by clicking the Delete icon.
Deleting a group does not delete its users.
5. Make your changes and click Update.
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 167
Add a user
You will create a user account for each unique person who will access
ThoughtSpot, either manually or through LDAP. This procedure shows how to
creating a user manually.
When you create a user, you can assign group memberships. The group's
privileges and permissions apply to all of its members. Any user you create will
be added to the group All automatically.
1. Log in to ThoughtSpot from a browser.
2. Click on the Admin icon, on the top navigation bar.
Figure 79: The Admin icon
3. In the Admin panel, click on User Management and Users.
Figure 80: Manage Users
4. Click the + Add User button on the upper right hand side of the list of groups.
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 168
Figure 81: Add a new User
5. Enter the details for the new user:
Figure 82: Create a user manually
a) Name: A unique name for the user (usually their first and last name).
b) Username: A login name for the user.
Note: Usernames must be unique and lowercase.
If you are using Active Directory to authenticate users, and your
LDAP configuration requires users to be created manually (i.e. they
are not created automatically in ThoughtSpot upon authentication),
the username you specify has to be domain qualified (e.g.
username@ldap.thoughtspot.com), and you must enter a dummy password.
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 169
c) Password and Confirm Password: A temporary password.
d) E-mail Address: The user's email address. This is used for notification when
another user shares something with them.
e) Add to group: Select all the groups the user will belong to.
If you add the user to a group that has the privilege Has administration
privileges, note that they will be able to see all the data in ThoughtSpot.
Administrators can see all data sources, and Row level security does not
apply to them.
6. Click Save to create the user.
When you create a new user, the groups they belong to define the user's:
Privileges - the actions they are allowed to do, which are defined when you
Add a group and set security privileges.
Permissions - the data they can access and view, which is defined when you
Data security.
Add multiple users to a group
You can add multiple users to a group using one button.
To add multiple users to a group:
1. Log in to ThoughtSpot from a browser.
2. Click on the Admin icon, on the top navigation bar.
Figure 83: The Admin icon
3. In the Admin panel, click on User Management and Users.
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 170
Figure 84: Manage Users
4. Select the users you would like to add to the same group from the list.
5. Click the Add Users to Groups button on the top of the list of users.
Figure 85: Add a new User
Edit or delete a user
After a user has been created, you can always go back and change their settings,
for example to change their group memberships or change their password.
As an administrator, you can edit a user and change the groups the user belongs
to. You can also edit a user to reset a user’s password by entering and confirming
the new password. This is useful if a user has forgotten their password, or to
effectively disable an account.
Manage users, groups, and privileges
ThoughtSpot Administration Guide Copyright © 2017 by ThoughtSpot. All Rights Reserved. 171
To edit an existing user:
1. Log in to ThoughtSpot from a browser.
2. Click on the Admin