Data Storytelling Using Tableau User Guide
User Manual:
Open the PDF directly: View PDF .
Page Count: 39
Download | |
Open PDF In Browser | View PDF |
DATA STORYTELLING USING TABLEAU USER GUIDE This user guide will help you through the different modules of the workshop Page 1 Author: Mallikarjun P M Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Module 1 Connect to the data source The data source is a set of flat CSV files. Tableau allows connection to other types of sources as well (MySQL, Google Analytics, Salesforce and others). For the purpose of simplicity, the workshop today covers only connections to flat(CSV) files. During the demo, the use of PostgreSQL with Tableau will also be demonstrated. Download data from the following link: http://bit.ly/aventisXtableau Page 2 Live Hands-On: Point to the data source and load the file Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Load a single file/table Once you select the CSV files, you’ll see that all your tables have been loaded and the other files are available to on the left-hand pane. Union two tables together Page 3 Drag the file Orders 2 and union it with the already present Orders 1 table. UNION adds up all the data rows together and creates a single table. Please note that the column structure must be the same for 2 files to be combined as a union. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Choose connection type Live: updates real time. If the data in the source table changes, it will be updated on tableau as well Extract: an extract in the form of a ‘.hyper’ database that is saved onto the disk. You can choose the refresh rate at which the data needs to be updated. Join multiple tables Tableau also provisions different join types to join multiple tables together. From your knowledge of unions and joins, replicate the structure represented below. Most commonly used file types in Tableau Tableau Packaged Workbook .twbx Copyright © Xaltius Pte. Ltd. 2019 Purpose It contains information on each sheet and dashboard that is present in a workbook. It has the details of the fields, which are used in each view and the formula applied to the aggregation of the measures. It also has the formatting and styles applied. It contains the data source connection information and any metadata information created for that connection. This file format contains the details of a workbook as well as the local data that is used in the analysis. Its purpose is to www.xaltius.tech | info@xaltius.tech 4 Tableau Workbook File Extension .twb Page File Type Tableau Data Extract .hyper share with other Tableau desktop or Tableau reader users, assuming it does not need data from the server. This file contains the data used in a .twb file in a highly compressed columnar data format. This helps in storage optimization. It also saves the aggregated calculations that are applied in the analysis. This file should be refreshed to get the updated data from the source. Data Types in Tableau Data Type STRING NUMBER Description Any sequence of zero or more characters These are either integers or floating points BOOLEAN They are logical values. DATE & DATETIME Tableau recognizes dates in almost all formats. There are in-built functions in Tableau to format the dates Example ‘Hello World’ 322 145.33 TRUE FALSE "02/01/2015" "#3 March 1982" Page 5 Tableau Workflow Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Exploring the Tableau Interface What are the different components on the Tableau Interface that are important? • • • • • • • • • • • Data source Data pane Analytics pane Columns Rows Filter card Marks card Create Worksheet Create Dashboard Create Story Menu Highlights: a) File -> Export Packaged Workbook b) Data -> Refresh all Extracts, Edit Relationships c) Worksheet -> Show Summary, Run Update d) Dashboard -> Actions, Export Images e) Analysis -> Create Calculated Field Page 6 This is demonstrated during the workshop. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Module 2 Worksheet Basics • • • • • Add Worksheet Rename Worksheet Save and Delete Worksheet Reorder Worksheet Paged Workbook – when you want to use the same visualization for a dimension Dimension and Measure Page 7 In tableau, the dimensions and measures are automatically detected, and you can see them in the respective pane. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Continuous vs Discrete Data Examples Discrete : Copyright © Xaltius Pte. Ltd. 2019 Page 8 1. Number of children in a household 2. Number of languages a person speaks www.xaltius.tech | info@xaltius.tech 3. Number of people sleeping in stats class Continuous: 1. 2. 3. 4. Height of children Weight of cars Time to wake up in the morning Speed of the train Different combinations of dimensions/measures: Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Page You can organize your dimensions and measures in folders if required. The purpose of using folders is to organize your data better and in a more efficient way. 9 Create Folders in Tableau To do so, select on any dimension/measure and group by folder. Then, select all the variables that you want to group together using ctrl + click. Create Hierarchy in Tableau Page 10 Multidimensional (cube) data sources contain hierarchies. One of the most useful ways to navigate hierarchies is to drill down or drill up. For example, if you are examining the sales totals for various states, you can then drill down and view sales for all the cities within each state. Alternatively, if you are examining sales totals for all cities, you can then drill up and view the sales for each state. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Drill down or drill up the hierarchy Page 11 You can also drill down or up any hierarchies, depending on what level of granularity is required. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Plot Types Page 12 How does one choose the right visualization? The below figure provides conditions for usage of different kinds of charts. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Module 3 Dashboards in Tableau A dashboard is a collection of several views, letting you compare a variety of data simultaneously. For example, if you have a set of views that you review every day, you can create a dashboard that displays all the views at once, rather than navigate to separate worksheets. An example of a dashboard is shown in the below figure. Actions in Tableau Dashboard Types of actions: Link to a different worksheet Link to an external web page Highlight a particular value in a dimension Filter values Page 13 • • • • Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Storyboarding in Tableau In Tableau, a story is a sequence of visualizations that work together to convey information. You can create stories to tell a data narrative, provide context, demonstrate how decisions relate to outcomes, or to simply make a compelling case. A story is a sheet, so the methods you use to create, name, and manage worksheets and dashboards also apply to stories. At the same time, a story is also a collection of sheets, arranged in a sequence. Each individual sheet in a story is called a story point Page 14 Advantages of using stories is that you can present your insights in a sequential way and uncover the story hidden in your data. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Module 4 Data Sources Tableau can connect to many kinds of data sources if you are using the desktop version. If you are using the public version, then you can only connect to flat files (CSVs, text files and Excel files). However, do note that Tableau is not meant for data pre-processing and should not be used for complex calculations. Tableau’s native connectors can connect to the following types of data sources. • • • • File Systems such as CSV, Excel, etc. Relational Systems such as Oracle, SQL Server, DB2, PostgreSQL etc. Cloud Systems such as Windows Azure, Google BigQuery, Google Sheets etc. Other Sources using ODBC The following image shows all the sources to which you can connect Tableau to. Connection types Live: used for real time data analytics. The result of the analysis is up to the second, and the latest changes are reflected in the result. However, on the downside, it burdens the source system as it has to keep sending the data to Tableau. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Page to the source anymore while analyzing the data. The cached file is then persisted as .hyper file, which is a sort of data store. 15 Extract: Tableau can also process data in-memory by caching them in memory and not being connected You can add constraints while creating the extract. Constraints could be 1. Filter/exclude data along a dimension/measure 2. Add only the newly added rows/values (Incremental refresh) 3. Include only the top n rows into the extract Connecting to multiple data sources Page 16 Tableau can connect to different data sources at the same time. For example, in a single workbook you can connect to a flat file and a relational source by defining multiple connections. This is used in data blending, which is a very unique feature in Tableau. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech As you can see, all three tables are taken from different data sources. • Orders table is in the database • People table is available as a CSV flat file • Returns table is available as an Excel file Different data sources have a different colour coding PRO TIP: Page 17 You can write custom SQL while connecting to a relational database store. You can restrict the amount of data being imported by applying a filter in SQL itself, which will improve the performance on Tableau. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Union multiple tables using the wildcard method Suppose we have multiple tables that have the same structure but have different names, for example Page 18 We need to union all of them, but it may not be possible to add all of them manually if the number of such tables is too high. We can use the wildcard method instead to add all tables that start with ‘Returns’. The same wildcard method can be used to search for files inside a subfolder and also a parent folder. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Data Source Filter Filtering in Live connections Page 19 You can only filter out data along dimensions or measures as show in the figure Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Filtering while creating/refreshing an extract Page 20 In addition to filtering along a dimension or measure, you can also use filters on the row number as well. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Module 5 Grouping on a dimension You can group multiple unique values into a single group for a dimension variable. In this use case, we can group the “First Class” and “Same Day” delivery option as a “Premium” delivery option and “Second Class” and “Standard Class” as an “Economy” delivery option. To do so, right click on the dimension that you want to create groups for and select Create -> group • • • Select the values that you want to group together by holding down the ctrl key and left click the values Then select the Group button Rename the new group Page The method to group a continuous variable is the same, although it doesn’t make sense as there would be a lot of unique values in a measure. Instead, values in a continuous variable can be binned together to form a dimension. 21 Grouping on a measure Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Binning on a measure Right click on the measure that you want to create bins for and select Create -> bins Select the bin size. If not specified, Tableau calculates the bin size according to the formula: Number of Bins = 3 + log2(n) * log(n) where n is the number of distinct rows in the table By specifying a bin size xxx, you are instructing Tableau to create equal sized bins. For more details on binning, refer to the video here : https://www.youtube.com/watch?v=VwDPBWuHu3Q TASK - Create equal sized bins for discount measure with each bin size of 0.1 and create a histogram to show plot the bin size vs % of transactions/records. In this use case, the discount is the ration of discount in $ of a product in an order / Sale amount of that product. If discount is 0.1, it means 10% discount was offered on that product. Page 22 The final visualization should look like: Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Create Set on a dimension Sets help us to selectively highlight our points of interest in a chart. For example, if you want to select the top 3 sub-categories by sales in a sub-category vs profit plot. Drag sub-category to Columns shelf Drag profit to Rows shelf Change the shape type from the Marks card • • Right click on Sub-category -> create -> set Select the Top tab -> By field -> enter 3 in the text box -> chose sales from the field drop down menu -> select sum for the aggregation drop down menu -> rename the field to “top 3 by sales” > click ok Page 23 • • • Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech • You’ll see a new set created in the sets pane. • Drag the newly created set to colour in the marks pane. In the product sub-category vs profit plot, you’ll see the top 3 by sales highlighted Filter Data Page 24 Drag any dimension/measure to the filter shelf. Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Dimension Filters Date Filter Filter a range of dates(continuous), relative dates(continuous) which start from a reference point, starting from date(continuous), ending at date(continuous), years(dimension), quarters(dimension), months(dimension), individual date(dimension) and many more other ways. General Dimension Filters For dimensions that contain string values, you can include or exclude based on 1. 2. 3. 4. the entire string itself, e.g. if you want to filter out “furniture” string contains/starts/ends with or exactly matches a pattern based on a condition, for e.g. filter the persons who’s count of return orders is >200 top n based on a measure, for e.g. filter the top 5 sub-categories based on the profit they made over the entire period TASK: Page 25 1. The data we have is from 1st Jan 2011 to 31st Dec 2014. Create a category-profit% pie chart and show a dynamic date filter to filter only profits made from 1st Aug 2012 to the end of period 2. Filter the top 10 products by sales in 2012 Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Module 6 Create Calculated fields This feature in tableau can be used to create derived variables. For example: • • • • • • • Derive the day of week of an order date Derive the first name of a customer from the customer’s full name Derive the discount amount by using the sales amount and % discount Derive the GDP per capita from the total GDP and total population of a country Derive if a customer is a low value, medium value or high value customer based on the amount spent at the store From the menu option, go to Analysis -> Create Calculated Field Write the formula or the if-else statement (to create a derived dimension using the values of a measure) You can also use the in-built quick table calculations to quickly calculate a few metrics from a measure. To do this, drag a measure to the column/row shelf or the marks pane -> right click on the measure > select the aggregation Page 26 • Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech You can also select the aggregation method for a measure. The default aggregation is SUM() if you drag a measure to the column/row shelf or marks pane. You can change that by doing • Right click on the measure from row/column shelf or marks pane -> Measure The quick table calculations and aggregations can be used in visualizations to quickly draw some insights. TASK: Page 27 1. Plot a chart to show %increase or decrease in sales compared to previous quarter for furniture category 2. Plot a dual axis chart to check if the profit margin (percentage of profit) always decreases with decrease in sales for each category Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech The final visualization should look something like this: Functions in Tableau Functions available in Tableau are similar to the functions that are available in Excel. You can use the in-built functions mentioned on the official (https://onlinehelp.tableau.com/current/pro/desktop/en-us/functions.htm) documentation link TASK Page 28 1. Derive the order day of week. This dimension will be used for the sales amount comparison per day of week. We want to test out the hypothesis that Customers spend different amount on weekends as opposed to weekdays 2. Derive the undiscounted sale price of each product (one record can be treated as an individual product in our use case) HINT: use the formula undiscounted price of product = sales/(quantity * (1-discount)) 3. Derive the cost price of each line item HINT: use the formula Cost price of line item = (Sales – profit – shipping cost)/quantity Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Module 7 Top N analysis Calculation of top n is useful when you want to display only the top n per group. For example, to show the top 3 countries by sales for each market in the data. • • • • • • • Create a new derived variable called Rank. Assign INDEX() to the value of this variable Drag Market from dimensions to Rows Drag Sales from measures to Columns. Make sure the aggregation is SUM Drag Rank to Rows, to the right of Market Drag Country to Rows, to the right of Rank Sort Country based on the sum of sales in descending order Right click Rank -> compute using -> Pane Down Take a closer look. The rank restarts from 1 for every market. This way, each country has a rank (highest sales to lowest sales) in its respective market • • Hold Ctrl and Click Rank from Rows, drag it to the filters pane. Select the numbers 1,2 and 3 Format the chart to show the actual sales values Filter by Parameter A parameter allows you to dynamically show the same chart for different dimension. For example, if you want to show the Top 3 countries per market but you want the user to choose from Sales/Quantity/Profit/Shipping Cost then use parameter as a filter. • • • • • Create a new parameter and call it Metric Data type -> Integer, Allowable values -> List Add value 1, 2, 3 and 4. Display as Sales for ‘1’, Profit for ‘2’, Quantity for ‘3’ and Shipping Cost for ‘4’ Set current value to ‘Sales’ Create new calculated field ‘Metric Selection’. Use the following for derivation: IF [Metric] == 1 THEN [Sales] ELSEIF [Metric] == 2 THEN [Profit] ELSEIF [Metric] == 3 THEN [Quantity] 29 ELSEIF [Metric] == 4 THEN [Shipping Cost] Page END Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech • • • • • Right click on metric -> show parameter control. You will see a Metrics card at the right Drag Order date to column and convert to continuous, right click and choose quarter Drag Metric Selection variable from measure and drop it to Rows Create a line chart to see the overall trend Select the metric from the Metric card. You can now switch from seeing the Sales figures to any of Profit/Quantity/Shipping Cost Add Reference Line Reference line is useful when you want to have a high-level statistic view (average/median/max) on the entire table Create a Year-Quarter vs Sum of Sales bar chart From the analytics pane, drag the Reference Line on top of the chart Drop on the “table” option Page 30 • • • Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Module 8 Analytics functions in Tableau Average line Can be used to add a reference average line or an average line with confidence interval • • • Drag Order date to Column and make it continuous. Drill down to Month level Drag Sales to Rows. Make sure the aggregation is SUM From the analytics pane, drag the Average with 95% CI to the chart and choose the table in distribution-band TASK – reproduce the below chart Trend Line • From the analytics pane, drag the trend line and drop on the chart to see the overall trend. This trend line fits a regression line on the data. The trend line also shows the coefficients, R Squared value and P value Page 31 TASK - Reproduce the below chart. The line that cuts across is the trend line or the regression line Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Forecast Similar to the average line and trend line, drop the forecast on the same chart. This analytics method creates a time series model to forecast the value (always should be a measure). TASK - Create a forecast for the next 5 months. After the forecast is created, see the description of the Page 32 forecast. It should look something like this: Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Drag Profit to Column Drag Sales to Row Drag Customer Name to Detail From analytics, drag cluster to the chart. This will create 2 clusters • From the marks pane, right click on the cluster and edit cluster. You can increase the number of clusters and change the clustering criteria. Page • • • • TASK - Describe the cluster and interpret the results Copyright © Xaltius Pte. Ltd. 2019 33 Clustering www.xaltius.tech | info@xaltius.tech Module 9 Map visualizations in Tableau Tableau provides 2 types of map visualizations – Symbol map and Filled map Symbol Map • • • • • Tableau automatically detects a country/state/city/postal code and generates the corresponding latitude/longitude Drag latitude to rows Drag longitude to columns Choose the symbol maps visualization Drag Country to detail Drag Sales to colour Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Page • 34 Symbol map is useful when you want to add an extra dimension. Use Cases of Map Visualizations In the above image, a pie chart is overlaid on top of each state. The pie chart shows the percentage share of profit for each category in each state. The colour code shows the profit value of each state. To see how it is done, follow the link Page 35 https://onlinehelp.tableau.com/current/pro/desktop/en-us/maps_howto_filledpiechart.htm Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Filled Map Filled maps in tableau are good to show clear differences in a metric. In the above visualization, the colour legend corresponds to the total sales for a country. The labels are set by calculating the percentage of total sales. TASK - Replicate the above visualization Web Map Service Tableau comes pre-loaded with a default background maps. You can change the default background maps by using a WMS of your choice or Mapbox. In this exercise, we will see how to change the default background maps to Mapbox. • • • • • Go to Map -> Background Maps -> Map Service -> Add -> Mapbox Maps -> Classic You can create a free account on Mapbox and use the public token to use the Mapbox service. For convenience purpose, the team has set up a temporary Mapbox public token for this workshop. Please note that it will be deleted after the workshop Enter the following public token into the API access token field pk.eyJ1IjoiYXJqdW5wbW0iLCJhIjoiY2p1YXlkbWszMDc5NDQycGMzeDV0MzZnOS J9.NIOIaN-0nSBrpMQ8OcRSbQ Choose the Run + Bike + Hike in pre-set style. Give it a name of your choice, click OK Go to Map -> Background map -> select the newly added map service Page 36 TASK - Recreate the Symbol Map using the newly added WMS Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Module 10 LOD Calculations LOD calculations are similar to group by aggregation in SQL. LODs are useful when the data is replicated at a particular dimension. If you want to run aggregation like SUM() on that dimension, the replicated data will lead to incorrect results. For example : If you want to create a frequency count chart to see the Total Order Quantity vs Number of unique orders (how many orders had x items in it), the usual method that strikes your mind is to create bins from Quantity -> drag the Quantity bins to columns -> drag Order ID to Rows -> convert Order ID to COUNTD -> create a histogram. This would produce the following visualization: Which is totally wrong. For each Order ID -> Sum the Quantity measure Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech Page that contribute to this sum by viewing the data. You will notice that the Order ID is being repeated in some of the rows and the true sum of quantity is calculated using the following logic: 37 TASK: Calculate the sum of quantity for order id ‘IN-2014-15263’ in Tableau. Check the individual rows LOD to the rescue • • • • • Create a calculated variable with the following formula: {FIXED [Order ID]: SUM([Quantity]) } Here, FIXED tells Tableau to fix Order ID as the dimension and sum the quantities for each Order ID. Looks similar to GROUP BY clause, doesn’t it? It is! Create bins for this new variable with bin size = 1 or you can create a parameter that can dynamically change the bin size Drag the bins dimension to rows and order ID to columns. Convert Order ID to COUNTD From SHOW ME, choose the text table Now you see the true quantities grouped by order ID. To verify, view the data corresponding to order quantity = 58. It will show all the corresponding data Now create a histogram. You’ll see a long-tailed histogram that shows the true count of orders vs total quantity per order Page Tableau offers FIXED, INCLUDE and EXCLUDE level of detail. For detailed explanation, refer to the following links: • https://onlinehelp.tableau.com/current/pro/desktop/en-us/calculations_calculatedfields_lod.htm • https://www.youtube.com/watch?v=kmApWaE3Os4 38 • Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech TASK: Create a visualization that would show how the sales of all sub-categories compare to that of a selected sub-category. The final visualization should look like below. The sub-category selection is dynamic here. HINT: Page 39 1. Create a parameter to select the sub-category 2. Create a new field (Selected-Sale) to show sales of selected sub-cat and rest to zero. Total sales of a sub-cat subtracted by Selected-Sale will give us the difference in sales if [Sub-Category] = [Select sub-category] THEN [Sales] ELSE 0 END 3. Create new variable (Sale of selected sub-cat) to exclude the selected sub-cat and sum SelectedSale. We will subtract this value from the total sales of each sub-cat to get the actual difference {EXCLUDE [Sub-Category] : SUM([Selected sales])} 4. Create a new variable (Sales diff from selected) SUM([Sales]) - SUM([Sales of Selected Sub-Cat]) Copyright © Xaltius Pte. Ltd. 2019 www.xaltius.tech | info@xaltius.tech
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.7 Linearized : No Author : Aditya Arasanipalai Mohan Create Date : 2019:04:19 23:42:55+08:00 Modify Date : 2019:04:20 16:37:33+08:00 Has XFA : No Language : en-US Tagged PDF : Yes XMP Toolkit : Adobe XMP Core 5.6-c016 91.163616, 2018/10/29-16:58:49 Producer : Microsoft® Word for Office 365 Format : application/pdf Creator : Aditya Arasanipalai Mohan Creator Tool : Microsoft® Word for Office 365 Metadata Date : 2019:04:20 16:37:33+08:00 Document ID : uuid:62DE64CB-3857-4A6B-887A-73877652B962 Instance ID : uuid:cb7643a6-9aa2-4409-98c6-cc9fe03d662c Page Count : 39EXIF Metadata provided by EXIF.tools