Metatron Discovery.user.manual.en

User Manual:

Open the PDF directly: View PDF .
Page Count: 168

Download
Open PDF In Browser	View PDF

2.0
Discovery User Manual
Easy and quick connection, analysis, and visualization of
Big and complicated data

Part 1

Discovery Outline

The content of this document must not be copied,
distributed or used in part or in whole without the prior approval of SK Telecom.

Contents
1. Advantages of metatron Discovery ........................................ 3
2. Structure of metatron Discovery ............................................ 5
3. Relational OLAP vs. Multidimensional OLAP ........................ 7
3.1 Characteristics and limits of the relational OLAP system .......................... 7
3.2 Multidimensional OLAP system and druid engine ....................................... 9

Discovery Outline

Contents

1. Advantages of metatron Discovery
metatron Discovery is a product for Single Solution Data Discovery, supporting end-to-end
functions for massive data from preparing to searching data based on visualization and advanced
analysis. The following figure summarizes the architecture and main advantages of metatron.

Architecture and advantages of metatron-Based Technology

Analysis & analytics tools

 Rapid visualization of massive data
 Easy and intuitive visualization features rendered
by drag & drop, detail view, etc.
 Verifies the analysis results through a real-time
dashboard
 Interoperable with third-party UIs via various types
of visualization APIs
 Uses analytics tools and shares analysis results
using notebooks
 Various methods of analysis available through

Analysis

Application
(visualization, notebook, etc.)

Advantages

App.

Architecture

interoperability with Jupyter and Zepplin
 Provides AI-assisted analytics features* (such as
automatic content analysis)
 Big OLAP cube and dynamic schema to minimize

Data Processing Engine

Storage

Saving

Processing

Big Data

the ETL cost
 Offers an environment that enables users to ingest,
process, and handle data directly with self-service
data preparation support
 Sub-second processing of time-series data
 Low-cost storage of full data sets and improved

Data
Source

storage speed based on HDFS

File system

 Real-time data ingestion
 Upload and sequential analysis of local computer
data
 Ingests various types of source data (Text, CSV,
DBMS)

* The features offered are under implementation.

Discovery Outline

Advantages of metatron Discovery

Architecture and advantages of the metatron Discovery module

Architecture

Advantages
Intuitive analysis

Application
Workbook
Notebook
Workbench
Data preparation

Analytics
Links

Metis
Self-

Spark

develo

ped

Lib

Easy analysis using the end-to-end feature
from the data preparation stage right
through to the analysis chart on the big
data with intuitive interface

Jupyter

Big OLAP Cube
Minimizes the ETL cost and supports speed
improvement/schema change by combining
various types of Dimension data based on
large amounts of fact data to create one Big
Mart

Lib

Data Processing Engine
OLAP Engine

SQL Engine

Druid

Spark SQL

Zookeeper

Hadoop

Deep Storage

Sub-Second Processing Engine
Transfers data to In-memory, Local Storage,
and Deep Storage over time to enable fast
response even for massive data larger than
one terabyte

Enhanced Druid Engine
DPA (Accelerator)

Develops key functions and performance
improvement to develop a metatron-specific
Druid engine instead of simply using the
open source

File system

Discovery Outline

Advantages of metatron Discovery

2. Structure of metatron Discovery
metatron Discovery imports data from a data source ingested in the metatron operation server or
other external data sources, analyzes it using various advanced analysis functions, and outputs the
result as various charts and reports. To utilize this module, you must understand the overall
structure shown below.

Data Source
File

Database

Data Preparation &
Ingestion

Data Storages
(Big Mart)

Ingestion

Preparation

Data entry and
batching process

Data preparation/
cleansing through
data flow

Staging DB

Ingestion/storage
management

Dataflow

Dataset

Data
connection
management

Data
source

Data
snapshot

Notebook
Analysis page

Workbench
SQL

Data control
results

Data monitoring
Query audit
inspection

Workbook
Chart Set

Batch
monitoring

Log analysis

Permission set

Dashboard/

Data
connection

Data workflow

Query statistics

Data source
management

Data transfer
management

Workflow

Data
Statistics

Workspace
Data source

Data
ingestion/storage

Constant monitoring
of batch

Data Analytics &
Visualization

User Permission and Account
Data lineage

Data history

User Account
User account
sign-up

Source Data
Data imported when creating the data source. The current metatron version supports file type,
databases, and staging DB type.
Data Preparation & Ingestion
Refines and processes imports from source data, and then ingests them to metatron's Hive storage
or exports them into local files.
Data Storage
Saves data to the Druid engine of metatron based on multi-dimensional OLAP. Data saved as this
type facilitates easy visualization and analysis. (See Section 3.2 for more information.)

Discovery Outline

Structure of metatron Discovery

Data Analytics & Visualization
Searches, analyzes and visualizes data stored in data storage.
 Workspace: Space to create and manages workbooks, workbenches, and notebooks.
 Workbook: Analyzes multi-dimensional, time-stamped data sources stored in the Druid
engine and visualizes and reports the results.
 Notebook: Able to perform advanced analysis such as deep learning or machine learning by
linking external analytics tools including R, Python, and Scala.
 Workbench: Performs an SQL-based analysis.

Data monitoring
Monitors various data logs generated in metatron's Staging DB (internal Hive DB) and workbench
(external DB).
User Permission and Account
Adds and deletes metatron Discovery users, or manages user permission.

Discovery Outline

Structure of metatron Discovery

3. Relational OLAP vs. Multidimensional OLAP
Various cutting-edge systems and methods for big data analysis are applied in metatron Discovery.
Among them, you must understand the concept of the "multi-dimensional OLAP system" using the
Druid engine to use Discovery properly. This section will describe this concept. This section briefly
describes the characteristics and limits of the relational OLAP system, which has been the most
popular system for data collection and management in the past, and explains how Discovery’s
multidimensional OLAP system resolves these limits.

3.1 Characteristics and limits of the relational OLAP system
A relational OLAP system is the most popular method of collecting and managing data. The figure
below indicates an example of the star schema structure, the most frequently used structure in a
relational OLAP system.

As shown in the figure, a basic star schema is composed of many dimension tables and a small
number of fact tables.
 Detailed data on each dimension of the schema is stored in the dimension table. In the
example shown in the figure above, the three dimension tables each contain data relating to
the product, the employee, and the customer. Each of these is called a "dimension" because
you can use them as a standard for data classification (e.g., quantity sold by each employee
and quantity purchased by each customer).

Discovery Outline

Relational OLAP vs. Multidimensional OLAP

 In the fact table, a record containing the details of the event is added one at a time
whenever a new event occurs. In this case, a connection is made to each dimension table
using the reference ID as mediator.
The figure below summarizes an example of simple table based on the schema structure above
to aid in understanding.

Although the products sold, the sales staff, and the customer making the purchase are recorded
as a simple ID in each record of the fact table, each of these IDs can be used for searching detailed
information if necessary, since they are connected with a unique dimension table. For example, if
we search the product table, the P2 product in the first record indicates that the applicable product
type is pencil, the unit price for sale is 200 KRW, the unit price for supply is 100 KRW, and the
supplier is Supplier 1. Using such data collection method offers the advantage of being able to
use the storage area effectively by storing only the minimum amount of data in the fact table.
When trying to analyze data stacked this way, however, the reference ID of the relational OLAP
actually becomes the cause of decreasing data accessibility and speed. This is because the original
data connected to all of the reference IDs included in each record used for analysis in the fact
table must be loaded one by one into each dimension table. If there are 100,000 records for
analysis, and 4 reference IDs per record, this must go through a data loading process of 100,000
x 4 times, causing that amount of load on the data processing device. Moreover, there are only a
limited number of skilled people who understand the schema structure of the database properly,
meaning only they are able to use such a data load query.
The multi-dimensional OLAP system described in the next section overcomes the limitations of
such a legacy structure by saving all data into one or a small number of tables.

Discovery Outline

Relational OLAP vs. Multidimensional OLAP

3.2 Multidimensional OLAP system and druid engine
Unlike the star-schema-based relational OLAP system described earlier, the multidimensional
OLAP system uses one or a small number of multidimensional tables. metatron Discovery
implements the OLAP system using a Druid engine. Below is a Druid data table using the same
objects as the data used in the relational OLAP system discussed in the previous section.

Discovery Outline

Relational OLAP vs. Multidimensional OLAP

In a Druid engine-based multidimensional table, a new record is added each time an event occurs
in the same way as with the fact table of the star schema system. The time is also recorded as a
time stamp. As shown in the example above, however, all data items are saved to one table at
once without using the IDs that refer to specific dimension tables. Since the time stamp is also
considered to be a dimension (time dimension), the data mart above has a total of 4 dimensions.
Although data capacity is increased since duplicated data is increased if this storage method is
selected (for example, the same product name is mentioned repeatedly in many records), it
enables faster and more effective operation when it comes to analysis for the following reasons:
 Since all data is stored in one or a small number of tables, and can be extracted by simply
selecting only the columns required for analysis at the time, there is far less data load
compared to the relational structure.
 The cost required for understanding reference relations between tables and filling
out/executing the query is reduced accordingly.
 In particular, since it is easy to store and process the data mart by distributing it over many
nodes (based on Hadoop cluster in the case of Druid engines), data can be processed and
analyzed in terabyte units very promptly and effectively.

We are now going to provide you with more information on how data analysis takes place in a
multidimensional OLAP system. A three-dimension cube model is sometimes used for
understanding multi-dimensional data mart analysis.

Discovery Outline

Relational OLAP vs. Multidimensional OLAP

Each axis of the cube represents each dimension of the OLAP-type data mart. As cases in which
there is a limit of three dimensions are rare in the actual data mart, such a cube model is merely
for illustrating the process of analyzing the data mart simplified into three dimensions; it cannot
express the data mart from the reality of four dimensions or more.
Although the data mart used as an example in this section is also composed of a total of four
dimensions, we will use only three dimensions by ignoring the "employee" dimension for
convenience in order to explain the fundamentals of the analysis process using such a threedimensional cube model. The following is an analysis process for checking the sales of a specific
product for a specific customer during a specific period.
1. First, pick out only the data with product type "pencil" from the data mart.

2. Then select the purchases of Corp. A.

Discovery Outline

Relational OLAP vs. Multidimensional OLAP

3. Finally, select data in 2018. The following example indicates Corp. A purchased 16 units of
Product A on June 9, 2018, and 2 units on October 16, 2018.

metatron Discovery provides many functions for analysis of multi-dimensional data with an
intuitive GUI.

Discovery Outline

Relational OLAP vs. Multidimensional OLAP

Part 2

Management

The content of this document must not be copied,
distributed or used in part or in whole without the prior approval of SK Telecom.

Contents
1. Management Outline ................................................................ 4
2. Data Storage Management ...................................................... 5
2.1 Data source.................................................................................................................... 5
2.1.1 Data source management home screen ................................................................. 5
2.1.2 Data source details ........................................................................................................... 7
2.1.3 Create a data source ...................................................................................................... 15

2.2 Data connection ........................................................................................................28
2.2.1 Data connection management home..................................................................... 28
2.2.2 Create a new data connection................................................................................... 30

3. Data Preparation Management ............................................. 33
3.1 Dataset ...........................................................................................................................33
3.1.1 Dataset management home ....................................................................................... 34
3.1.2 Dataset details................................................................................................................... 35
3.1.3 Create a new dataset ..................................................................................................... 36

3.2 Dataflow ........................................................................................................................44
3.2.1 Dataflow management home screen...................................................................... 44
3.2.2 Create a new dataflow .................................................................................................. 45
3.2.3 Dataflow details ................................................................................................................ 47
3.2.4 Edit a Wrangled dataset ............................................................................................... 50
3.2.5 Create a data snapshot ................................................................................................. 52

3.3 Data snapshot ............................................................................................................55
3.3.1 Data snapshot management home ......................................................................... 55
3.3.2 Data snapshot details .................................................................................................... 56

4. Notebook Management ......................................................... 57
4.1 Notebook server........................................................................................................57
4.1.1 Search notebook server list......................................................................................... 57
4.1.2 Register a new notebook server ............................................................................... 58

Management

Contents

5. Data Monitoring Management.............................................. 59
5.1 Log analysis .................................................................................................................59
5.1.1 Log analysis home screen............................................................................................ 59

5.2 Job log ...........................................................................................................................61
5.2.1 Job log home .................................................................................................................... 61
5.2.2 Job log details ................................................................................................................... 62

5.3 Data lineage ................................................................................................................63
5.3.1 Data lineage management home screen.............................................................. 63
5.3.2 Data lineage details ........................................................................................................ 64

Management

Contents

1. Management Outline

As shown above, data used by the 3 metatron Discovery analysis modules (workbook, notebook
and workbench) is imported from various types of source data, engines and storage systems.
Therefore, such dataflow needs to be standardized and managed, and different types of source
data need to be linked.
The management menu is used to manage the above processes in black. These processes are
divided into the 4 sub-menus as follows.
Chapter 2 Data Storage (Data Source and Data Connection): Describes the procedure for
managing data sources and the database connection required for data analysis and
visualization.
Chapter 3 Data Preparation (Dataset, Dataflow, and Data Snapshot): Describes the
procedure for preparing the imported original data.
Chapter 4 Notebook management: To use the notebook module, the server must be linked
with Jupyter or Zeppelin. This chapter describes the procedure to set and manage these
servers.
Chapter 5 Data Monitoring (Log Analysis, Job Log and Data Lineage): Describes the
procedure to monitor and track data usage.

Management

Management Outline

2. Data Storage Management
Data to be used in metatron Discovery can be linked in the two following ways.
 Loads the original data in the internal engine ‘Druid’, and stores it by ‘data source’ unit. (See
Paragraph 2.1.)
 Links the SQL database directly. (See Paragraph 2.2.)
Data storage provides functions to link such data to metatron and manage it.

2.1 Data source
In metatron Discovery, ‘data source’ means the unit of data collected by the Druid engine. Each
data source is saved in the Druid database as a table. These data sources are used in ‘workbook’
or ‘notebook’ for analysis and visualization.

2.1.1 Data source management home screen
In this screen, a user can register, edit or search data sources.

5
8
6

10
1

❶ Data type: Search data sources by type of original data.
 All: Displays all data sources regardless of original data type.
 File: Displays data sources created by the file imported from a local PC.
 Database: Displays data sources created by the data imported from a database.
 Staging DB: Displays data sources created by data imported from metatron's internal
Hive database.
Management

Data Storage Management

❷ Ingestion type: Search by the data source's type of ingesting data.
 All: Displays data sources regardless of ingestion type.
 Ingested data: Displays data sources ingested by saving data in the metatron server
directly.
 Liked data: Displays data sources that import data from the linked database whenever
necessary.

❸ Status: Searches by availability of the data source saved in the active data storage.
 All: Displays all data sources regardless of their availability.
 Enabled: Displays data sources that passed data ingestion and are available in a
workbook or workbench.
 Disabled: Displays data sources that passed data ingestion but are unavailable
because of a problem in a certain Druid process.
 Preparing: Displays relatively new data sources in the data ingestion process.

❹ View only open data: Searches only data sources allowed in all workspaces.
❺ Time: Standard time applied when searching a data source. Select created date or
updated date, and select All/Today/Last 7 days/Specific day as a time range.

❻ Search by data source name: Searches a registered data source by its name.
❼ Number of data: Displays the number of data sources searched from the current list.
❽ Create a new data source: Click to create a new data source.
❾ Data source list: Displays data sources that meet the established criteria. Click a data
source to see its details. (See Paragraph 2.1.2.)

❿ Delete: Hover the mouse over the data source to display the trash icon. Click the icon
to delete the data source.

Management

Data Storage Management

2.1.2 Data source details
Click a data source listed on the data source management home to view various attributes of
that data source. When viewing the following areas, each data source is saved in metatron's
internal Druid database as a table, and includes a timestamp column due to the time-series
attribute of Druid.

A. Common Top Area
1

❶ Name: Name of the data source. Click to edit it.
❷ Description: Description of the data source. Click to edit it.
❸ Information last updated by: Displays the user who last updated the data source.
❹ Delete: Click this icon to display the menu to delete the data source.
❺ Tab selection area: Each tab displays a specific attribute group of the data source.
According to the type of data source, not all of the 4 tabs may be displayed tab data
sources. For more information on each tab, refer to the relevant paragraph below.

Management

Data Storage Management

B. Information tab
In the information tab, a user can view details of the data source and edit basic matters. This
tab is composed of ‘Data information’, ‘Permission’ and ‘Ingestion information’.
Data information area
The area displays basic information of the data source.

1
2
3

4
5

❶ Data type: Type of original data imported when creating the data source.
❷ Status: Displays the availability of the data source.
❸ Size: Displays the size of the data source.
❹ Duration: Displays the time range of the timestamp included in the data source.
❺ Granularity settings: Displays the Granularity interval defined when creating the data
source.
 Segment Granularity: Defines the unit of divided data saving required to utilize Druid
operating in the distributed environment.
 Query Granularity: Defines the unit of minimum time for performance in the analysis.
It is required to obtain responses faster by generating the result of a minimum unit in
advance.

❻ Histogram: A graph displaying the capacity to save data at each time by Kbyte. This
histogram is enabled because the Druid engine must record a timestamp for each
record.

Management

Data Storage Management

Permission area
In this area, a user can check and set the workspace where the data source can be used.
1
2
3

❶ Allow all workspaces to use this data source check box: Click this check box to use
the data source in all workspaces.

❷ Edit: Used to designate a specific workspace to allow the data source. This button will
disappear if the data source is set as open data.

❸ Number of shared workspaces: Displays the number of workspaces where the data
source can be used.
Ingestion information area
This area displays ingestion information on the master data, and origin of the relevant data
source.

Management

Data Storage Management

C. Grid data tab
The grid data tab displays details of data in the data source.

❶ Search data: Searches by the contents of a data table.
❷ Role: Searches a column of the data table by All/Dimension/Measure.
❸ Type: Searches a column of the data table by field type.
❹ Row: Displays the number of records registered in the data table.
❺ Download CSV: Download data displayed on the screen as a CSV file.

Management

Data Storage Management

D. Column detail tab
In the column detail tab, a user can view details of each column of the data source table.
Column search/setting
In the top of the column detail tab, a user can use the UI to filter columns by certain criteria.
Columns that meet the criteria will be displayed on the left. Also, a user can edit column
settings.
2

1
6
7

❶ Search data: Searches by column name.
❷ Role: Searches a column of the data table by All/Dimension/Measure.
❸ Type: Searches a column of the data table by field type.
❹ View all: Cancels all search criteria set in the Search data, Role and Type options, and
returns to View All Columns.

❺ Configure schema: Click to display a window to edit the settings of the current column.
❻ Data source name: Displays the name of the data source (relevant 'table').
❼ Column list: Lists table columns.

Management

Data Storage Management

Column basic information area
This area displays the various basic information of the selected column.
1

❶ Schema information: Displays the attributes of the selected column.
❷ Display setting: Displays the meta data information of the selected column.
❸ Summary: Displays the value configuration information of the selected column.
❹ Statistics: Displays the minimum value and the maximum value among the values
entered in the selected column.
Histogram area
A graph displaying the capacity to save data at each time by Kbyte. This histogram is enabled
because the Druid engine must record a timestamp for each record.

Management

Data Storage Management

E. Monitoring tab
Displays the log where the data source is used.
Transaction change area
Displays the trend of data source transactions over time.

Data size change area
Displays the trend of data source capacity over time.

Query distribution area
Displays information on the query performed for the data source.
1

❶ Query distribution by user (during the last week): Displays a graph of query
distribution by the user who performed the query in the past week.

❷ Query distribution by elapsed time (during the last week): Displays a graph of query
distribution by elapsed time for query performance in the past week.

Management

Data Storage Management

Query log area
Used to view a detailed history of each performed query.
2

❶ Query date: Set the running time zone for queries to be confirmed.
❷ Query type: Prints the performed queries by type.
❸ Result: Prints succeeded or failed query results.
❹ Query list: Lists queries that meet the established criteria.
❺ More: Click to view the query statement.

Management

Data Storage Management

2.1.3 Create a data source
Click the ‘+Create a new data source’ button in the top-right of the data source management
screen to create a new data source. First, select the type of original data.

❶ File: Imports a file saved on the user's local PC to create a data source. Only files
separated by a comma such as xls, xlsx and csv can be imported (for more information,
see Paragraph A).

❷ Database: Imports data from an external database to create a data source. Currently,
metatron Discovery supports Oracle, MySQL, Hive, presto and TIBERO (for more
information, see Paragraph B).

❸ Staging DB: Creates a data source based on the data imported from the metatron
internal Hive database.

❹ Real-time: This function is not currently supported.
❺ Data snapshot: This function is not currently supported.
❻ metatron engine: Migrate a data source saved in metatron version 1.0.

Management

Data Storage Management

A. Create a data source with file
1. In the screen to select the type of original data, select ‘File’.
2. This imports the file to be used as a data source from the user's local PC. Click the
Import button and select the file, or drag a file to the screen. Once the file is imported,
click 'Next'.

Management

Data Storage Management

3. From the file, select the sheet to be included in the data source.

❶ File name: Name of the imported file. The user can import another file.
❷ File sheet list: Displays sheets included in the imported file. Select the sheet to be made
as data.

❸ File sheet name: Name of the currently selected sheet.
❹ Capacity: Capacity of the imported file.
❺ Column: Number of columns in the imported file.
❻ Row: Number of rows in the imported file. Input the number of rows to be displayed on
the screen.

❼ Type: Displays the number of data types recognized from each column. Data type by
column can be edited in a later screen.

❽ Use the first row as the column name check box: Select it to use the contents of the
first row of the file as the column name. If you don't select it, a new row to enter the
column name will be created.

Management

Data Storage Management

4. Set schema to be realized in the data source.
Search columns that meet the criteria

❶ Search by column name: Search a column in the imported file by its name.
❷ Role: Searches a column in the imported file by All/Dimension/Measure.
❸ Recommended filter: Search columns where the priority filter is applied.
❹ Type: Searches a column in the imported file by field type.
❺ Column list area: Displays columns that meet the established criteria (for more
information, see ‘Edit and delete columns in a batch’ below).

❻ Individual column setting area: This area is used to set the attributes of the column
selected from the column list (for more information, see ‘Edit attributes of individual
column’ below).

❼ Timestamp setting: Sets the timestamp that must exist in the data source. The
metatron engine is a time-series engine that requires a time value to save the data
source. Therefore, a user can designate a time-type column of the existing data as a
timestamp, or create a time-type column based on the current time value, and
designate it as a timestamp.

Management

Data Storage Management

Edit and delete columns in a batch
In the data column list area, a user can edit or delete columns in a batch. Select the check
boxes of columns to be deleted or which need to change type, set the operation you want
referencing the following description, and click ‘Apply’.

❶ Change/delete type: Select the operation to be applied to the selected columns.
❷ Dimension/measure: Change the role of the columns as dimension or measure.
 If you select dimension, you can change the data type to
character/Boolean/Integer/Decimal/Date/Time/Latitude/Longitude.
 If you select measure, you can change data type to Integer/Decimal.

❸ Column type: Select the new type of columns.

Management

Data Storage Management

Edit attributes of an individual column
This area is used to edit attributes of the columns selected from the column list.
1

❶ Name: Name of the selected column.
❷ Number of data rows: Indicates the number data rows displayed on the screen.
❸ Role: Displays the dimension/measure (role applied to the column). A user can change it
from this menu.

❹ Type: Displays the data type of the column. A user can change it to another data type.
Available data types are different based on the selected role (dimension/measure).

❺ Recommended filter: If there is a column containing a massive amount of data, a
timeout error may be generated due to a long loading time. The data administrator can
register such a column as a recommended filter to facilitate the configuration of the
dashboard or chart.
 Apply as a priority filter: Select whether to apply the selected column as a priority
filter.
 A user can select only a single item. The time is generated when the column is
applied as a priority filter. Check if the data is too big.
 Edit order: Click to the change priority ranking of columns where the current
recommended filter is applied.

❻ Missing: Set how to handle Null values in the column.
 Replace with: Replaces a Null value with the value inputted here.
 Discard: Discards a Null value.
 Not set: Displays a Null value as is. However, a Null value of a data source's
timestamp will be discarded.
Management

Data Storage Management

5. Set data source ingestion and click Next.

2
3

❶ Granularity settings: Set two types of Granularity.
 Segment granularity: Defines the time unit of divided data saving required to utilize
Druid operating in the distributed node environment.
 Query granularity: Defines the unit of minimum time for performance in the analysis.
It is required to obtain responses faster by generating the result of a minimum unit in
advance.

❷ Rollup: Rollup includes a summarization of data based on dimension. A summarization
rule can be the calculation of a sum following hierarchy or application of a formula
group such as ‘profit=sales=expenses’. In short, rollup is an effective option for
ingestion.

❸ Advanced setting (Opt.): Configures the ingestion tuning. Ingest tuning configuration is
an effective option for ingestion. Input JSON format syntax in the text box. Example:
{maxRowsInMemory : 75000,
maxOccupationInMemory : -1,
maxShardLength : -2147483648,
leaveIntermediate : false,
cleanupOnFailure : true,
overwriteFiles : false,
ignoreInvalidRows : false,
assumeTimeSorted : false}
Management

Data Storage Management

6. Confirm the information in the data set from the imported file, enter the name and
description, and click Done to create a data source. It may take a few seconds or minutes
according to the amount of data, because data is ingested from the original data to the
metatron internal engine (Druid).

7. Move to the data source management home to check the new data source from the
screen. During ingestion, the status will be displayed as Preparing. Once it is finished, the
status will be changed to Enabled. After that, a user can use the data source.

Management

Data Storage Management

B. Create a data source using a database or Staging DB
1. From the original data type selection screen, select ‘database’ or ‘Staging DB’.
2. If you select ‘database’, load an existing data connection or enter connection information
for a new database, in the data connection setting screen as follows. This step is skipped
if you select ‘Staging DB’.

1
2
3

❶ Load a data connection: Used to select a saved data connection. Select it to
automatically load access information on the database connected to the relevant data
connection. However, you must verify the connection by clicking the ‘Test’ button.

❷ DB type: Select the type of database to be connected. Currently 5 database types are
supported. (Oracle, MySQL, Hive, presto, TIBERO)

❸ Ingestion type: Select the data source's method to ingest data.
 Ingested data: Displays data sources ingested by saving data in the metatron server
directly.
 Liked data: Displays data sources that import data from the linked database whenever
necessary.

❹ Host: Select the value of the host to be connected.
❺ Port: Enter the number of the port to be connected.
Management

Data Storage Management

❻ SID/Catalog: For Oracle and TIBERO, enter SID. For presto, enter Catalog value.
❼ Username: Enter the username of the database.
❽ Password: Enter the password of the database.
❾ Test: If you enter all fields, the Test button will be activated. Click it to view the
connection test result at the bottom of the button. ‘Valid connection’ means it is
normal, and ‘Invalid connection’ means it is abnormal.

❿ Save as a new data connection: To link a new database rather than using an existing
data connection, you must save connection information as a new data connection. Enter
the name to be used in this step.

Management

Data Storage Management

3. Select data. You can select a table from the account of linked databases, or write a query
statement by yourself. You can only designate a table when creating a data source using
Staging DB.
Table
Select the database and table name, confirm the data to be saved and click ‘Next’.

❶ Select database: Select a database linked with the selected data connection.
❷ Select schema: Select a table in the selected database.

Management

Data Storage Management

Query
Write a query statement to import the data you want, and click ‘Run’ to display data at the
bottom. Confirm the data and click ‘Next’.

4. The subsequent procedures are the same as the procedures to import data for the file.
See Step 3 and onwards of Paragraph 2.1.3 A. However, when creating a data source
from a database, you must configure additional settings to set ingestion.

❶ Ingestion settings: Set data ingestion.
 Ingest once: Select to save data only once.
 Ingest periodically: Save data on a regular basis.

❷ Scope of data ingestion: Set the scope of data ingestion.
 Retrieves the entire data: Ingests all data when running data ingestion on a regular
basis.
 Receives only the first: Sets the amount of data to be inserted at the start of data.

Management

Data Storage Management

C. metatron engine (v1.0) data source migration
1. From the original data type selection screen, select ‘database’ or ‘metatron engine’.
2. When data sources created by metatron V1.0 are listed on the left as follows, select the
check box of the data sources to be migrated to V2.0.

3. Click ‘Done’ to migrate the selected data sources.

Management

Data Storage Management

2.2 Data connection
metatron Discovery can link an external database directly. To link an external database, you must
create and manage a data connection containing access information for that database. Register
such a data connection to avoid entering access information again for a new database. The
purpose of a data connection is divided as follows.
 For general purpose: Linked to the database when creating a new data source based on a
database (see 2.1.3 Paragraph B).
 For workbench: Linked to the database to be used in workbench (see ‘Part 6 Workbench’).

2.2.1 Data connection management home
In this screen, a user can register, edit or search a database connection.

2
1
3

7
8

❶ Type: Search a data connection by its use.
 All: Displays all data connections regardless of their use.
 General: Displays general purpose data connections. Used to create a new data
source.
 Workbench: Displays data connections for workbench. Used to create a new
workbench.

❷ DB type: Search a data connection by database type
(Oracle/MySQL/Hive/presto/Tibero).

❸ Time: Standard time applied when searching a data connection. Select created date or
updated date, and select All/Today/Last 7 days/Specific day as a time range.

❹ Search by data connection name: Searches a registered data connection by its name.
❺ Number of data connections: Displays the number of data connections searched from
the current list.

Management

Data Storage Management

❻ Create a new data connection: Click to create a new data connection.
❼ Data connection list: Displays data connections that meet the established criteria. Click
a data connection to edit its setting.

❽ Delete: Hover the mouse over the data connection to display the trash icon. Click the
icon to delete the data connection.

Management

Data Storage Management

2.2.2 Create a new data connection
A. Create a new general purpose data connection
A general purpose data connection is used to create a new data source from metatron
Discovery. Enter the following fields to create a new one.

1
2
4

❶ Type: Purpose of the data connection to be created. To create a general purpose data
connection, select ‘General’.

❷ DB type: Currently 5 database types are supported. (Oracle, MySQL, Hive, presto,
TIBERO)

❸ Host: Select the value of the host to be connected.
❹ Port: Enter the number of the port to be connected.
❺ SID/Catalog: For Oracle and TIBERO, enter SID. For presto, enter Catalog value.
❻ Username: Enter the username of the database.
❼ Password: Enter the password of the database.
❽ Test: If you enter all fields, the Test button will be activated. Click it to view the
connection test result at the bottom of the button. ‘Valid connection’ means it is
normal, and ‘Invalid connection’ means it is abnormal.

❾ Name of data connection: Enter the name of the data connection to be created.

Management

Data Storage Management

B. Create a new data connection for workbench
Data connection for workbench is used to link a database to be used in workbench. Enter the
following fields to create a new one.

2
3

10
11

❶ Type: Purpose of the data connection to be created. To create a data connection for
workbench, select ‘Workbench’.

❷ DB type: Currently 5 database types are supported. (Oracle, MySQL, Hive, presto,
TIBERO)

❸ Host: Select the value of the host to be connected.
❹ Port: Enter the number of the port to be connected.
❺ SID/Catalog: For Oracle and TIBERO, enter SID. For presto, enter Catalog value.
❻ Account input method: Set how to log in to use the connection in workbench.
 Enter by manager: Log in using information entered by the user when creating a data
connection.

Management

Data Storage Management

 Use user account: Log in using user account information registered in metatron
Discovery.
 Input on connection: Log in by entering information whenever connecting to the
workbench.

❼ Username: Enter the username of the database.
❽ Password: Enter the password of the database.
❾ Test: If you enter all fields, the Test button will be activated. Click it to view the
connection test result at the bottom of the button. ‘Valid connection’ means it is
normal, and ‘Invalid connection’ means it is abnormal.

❿ Name of data connection: Enter the name of the data connection to be created.
⓫ Permission: Designates a workspace to allow the use of the new data connection.
 Allow all workspaces to use this data source check box: Click this check box to use
the data connection in all workspaces.
 Edit: Used to designate a specific workspace to allow the data connection. This button
will disappear if the data connection is set as open data.
 Number of shared workspaces: Displays the number of workspaces where the data
connection can be used.

Management

Data Storage Management

3. Data Preparation Management
Data preparation is a process to prepare (refine, integrate, arrange and transform) the original data
before creating a data source for analysis. This is not a process to refine the existing data, but a
process to define preparation procedures for records to be added later, in order to automatically
refine every new piece of data according to the defined rules. The menu is composed as follows.
 Dataset: Manages the connection with the original data. (See Paragraph 3.1.)
 Dataflow: Defines the procedure to prepare a dataset. (See Paragraph 3.2.)
 Save snapshot: Identifies data inputted and prepared during the dataflow and saves it as a
file or database table. (See Paragraph 3.3.)

3.1 Dataset
Dataset means the original data for data preparation. A dataset is divided into two types.
 Imported dataset: A dataset containing the original data without defining the preparation
procedure.
 Wrangled dataset: A dataset with a defined preparation procedure. The preparation
procedure can be defined in the dataflow screen.

Management

Data Preparation Management

3.1.1 Dataset management home
In this screen, a user can register, edit or search a dataset.

1
3

2
5
6

❶ Type: Search a dataset by its type.
 All: Displays datasets regardless of their type.
 Imported dataset: Displays Imported datasets.
 Wrangled dataset: Displays Wrangled datasets.

❷ Search by dataset name: Search a registered dataset by its name.
❸ Number of data: Displays the number of datasets searched from the current list.
❹ Create a new dataset: Click to register a new dataset.
❺ Dataset list: Displays datasets that meet the established criteria. Click a data source to
see its details. (See Paragraph 3.1.2.)

❻ Delete: Hover the mouse over the dataset to display the trash icon. Click the icon to
delete the dataset.

Management

Data Preparation Management

3.1.2 Dataset details
Click a dataset listed in the dataset management home to view various attributes of that dataset.
1

3
5

❶ Name: Name of the dataset. Click to edit it.
❷ Description: Description of the dataset. Click to edit it.
❸ Type: Displays the type (Imported/Wrangled) of the dataset.
❹ Summary: Number of records and columns of the original data table linked to the
dataset.

❺ Data: Information of columns of the original data table linked to the dataset.
❻ Used in: Displays the name of the dataflow that uses the dataset, number of Imported
and Wrangled datasets used in that dataflow, time and User ID of the last update of
that dataflow. The details screen of the Import type dataset displays all dataflows linked
to that dataset.

Management

Data Preparation Management

3.1.3 Create a new dataset
Click the ‘+ Create a new dataset’ button in the top-right of the dataset management home to
create a new dataset and register data to be used for preparation.

❶ File: Imports files saved on the user's local PC and registers them as a dataset. Only files
separated by a comma such as xls, xlsx and csv can be imported.

❷ Database: Imports files from an external database and registers them as a dataset.
❸ Staging DB: Registers data imported from the metatron internal Hive database as a
dataset.

Management

Data Preparation Management

A. Create a dataset using a file
1. In the screen to select the type of original data, select ‘File’.
2. Imports the file to be used as a dataset from the user's local PC. Click the Import button
and select the file, or drag a file to the screen. Once the file is imported, click 'Next'.

Management

Data Preparation Management

3. Data in the imported file will be displayed as follows. Confirm that the data is what you
want, and click ‘Next’. If it is not what you want, import the file again.

1
3

❶ Name: Name of the file.
❷ Import or drop file here: Used to import a new file.
❸ Data preview: Used to preview the data of the file.
❹ Column delimiter: Enter the delimiter of the column. The default value is a period (.).

Management

Data Preparation Management

4. Confirm information on the file type data imported to create a dataset, enter the name
and description, and click 'Done' to create a new dataset.

Management

Data Preparation Management

B. Create a dataset using a database or Staging DB
1. From the original data type selection screen, select ‘database’ or ‘Staging DB’.
2. If you select ‘database’, load an existing data connection or enter connection
information for a new database, in the data connection setting screen as follows. This
step is skipped if you select ‘Staging DB’.

❷ DB Type: Select the type of database to be connected. Currently 7 database types are
supported. (Oracle, MySQL, PostgreSQL, Hive, presto, Phoenix, TIBERO)

❸ Host: Select the value of the host to be connected.
❹ Port: Enter the number of the port to be connected.
❺ SID/Catalog/DB name: Enter SID for Oracle and TIBERO. Enter Catalog for presto. Enter
DB name for PostgreSQL.

❻ Username: Enter the username of the database.
❼ Password: Enter the password of the database.
Management

Data Preparation Management

❽ Test: If you enter all fields, the Test button will be activated. Click it to view the
connection test result at the bottom of the button. ‘Valid connection’ means it is
normal, and ‘Invalid connection’ means it is abnormal.

❾ Save as a new data connection: To link a new database rather than using an existing
data connection, you must save connection information as a new data connection. Enter
the name to be used in this step.

❶ Select database: Select a database linked with the selected data connection.
❷ Select schema: Select a table in the selected database.

Management

Data Preparation Management

Query
Write a query statement to import the data you want, and click ‘Run’ to display data at the
bottom. Confirm the data and click ‘Next’.

Management

Data Preparation Management

4. Finally, confirm the data configured so far, enter the name and description of the dataset
to be created, and click ‘Done’ to create the dataset. This dataset will be an Imported
dataset.

Management

Data Preparation Management

3.2 Dataflow
Dataflow is a process of preparing a dataset to secure data quality and make it into a type
optimized for analysis or visualization.

3.2.1 Dataflow management home screen
In this screen, a user can manage and prepare the original data.

1
4
5

❶ Search: Search the saved dataflows by name.
❷ Number of data: Displays the number of dataflows searched from the current list.
❸ Add a new flow: Click to add a new dataflow.
❹ Dataflow list: Displays dataflows that meet the established sort criteria. Each number
next to the icon assigned to each dataflow in the ‘Dataset’ column represents the
number of Imported datasets and Wrangled datasets used in the dataflow, respectively.
Click a data source to see its details. (See Paragraph 3.2.3.)

❺ Delete: Hover the mouse over the dataflow to display the trash icon. Click the icon to
delete the dataflow.

Management

Data Preparation Management

3.2.2 Create a new dataflow
Click the ‘Add a dataflow’ button in the dataflow management home to display the screen to
create a dataflow.
1. Select datasets from the list to be used to create a dataflow.

1
2

❶ Search by dataset name: Search a registered dataset by its name.
❷ Dataset list: Displays registered datasets. If you search by dataset name, only datasets
containing the string in their name will be searched.

❸ Create a dataset: Click this button to display a dialog window to move to the screen to
create a new dataset.

Management

Data Preparation Management

2. Confirm the selected datasets, enter the name and description of the dataflow to be
created, and click ‘Done’ to create the dataset.

Management

Data Preparation Management

3.2.3 Dataflow details
Displays the relation between all datasets in the dataflow, and displays each dataset for editing.
The relation of influence between datasets is displayed as a figure using functions such as
JOIN/UNION.
Outline of the dataflow details screen
1

10
5

7
9

❶ Name: Name of the dataflow.
❷ Description: Description of the dataflow.
❸ Information last updated by: Displays the user who performed the last dataflow
update.

❹ Delete: Click this icon to display the menu to delete the dataflow.
❺ Add a new dataset: Click to display a new window to add a new dataset to the
dataflow. Click the dataset to be added and click ‘Done’ to add that dataset.

❻ Imported dataset: Displays the number of Imported datasets included in the dataflow.
❼ Wrangled dataset: Displays the number of Wrangled datasets included in the dataflow.
❽
❾

: Represents each Imported dataset included in the dataflow.
: Represents each Wrangled dataset created by wrangling each Imported dataset.
‘Wrangling’ is a process used to transform an unrefined Imported dataset as easy-to-use
data using various tools (see Paragraph 3.2.4). A Wrangled dataset is registered as a
data source, and can be used for data analysis and visualization.

Management

Data Preparation Management

❿ Information on the selected dataset: Click a dataset icon on the screen to display an
overview of that dataset in this area. In this area, a user can confirm, copy or delete the
selected dataset, and edit the rules applied to this dataset. For more information on this
area, see the following description.

Management

Data Preparation Management

Information outputted when an Imported dataset is

selected

❶ Name: Name of the selected dataset.

❷ Description: Description of the selected dataset.

❸ Create a new dataset: Click to create one more
Wrangled dataset included in the selected Imported
dataset.

❹ Data preview: Displays basic information on the
selected dataset.

❺ Created on: Displays the creation time of the selected
dataset and the creator.

❻ Updated on: Displays the user who last updated the
selected dataset.

5
6
7

❼ Delete this data: Delete the selected dataset.

Information outputted when a Wrangled dataset is

selected

❶ Name: Name of the selected dataset.
❷ Description: Description of the selected dataset.

3
5

❸ Edit rules: Click to open a window to edit rules for the
selected dataset (for more information on this window,
see the next paragraph).

❹ Copy: Click to create another Wrangled dataset by
copying the selected dataset.

❺ Data preview: Displays basic information on the
selected dataset.

❻ Rule list: Displays rules currently applied to the selected
dataset.

❼ Created on: Displays the creation time of the selected
dataset and the creator.

❽ Updated on: Displays the user who last updated the
selected dataset.

❾ Delete this data: Delete the selected dataset.

Management

7
8

Data Preparation Management

3.2.4 Edit a Wrangled dataset
Editing a Wrangled dataset is the most basic work of data preparation. A user can define the
rules to transform a dataset, apply the rules and, based on the result, append a new rule, update,
delete, undo or redo the work to make rules to transform data quickly.
Dataset transformation rules and result of application
The currently applied transformation rules are listed on the top-right, and the result of
applying those rules is displayed on the left.
3

5
4

❶ Name: Name of the selected Wrangled dataset.
❷ Search: Search data in the selected Imported dataset to display only data containing
the searched string.

❸ Snapshot: Saves all data collected and prepared based on the result of the selected
Wrangled dataset as a file or database table, to use as a data source (see Paragraph
3.3).

❹ Wrangling result data: Displays Wrangling result data by applying rules to the dataset.
❺ Rule list: Displays rules applied to the dataset.

Management

Data Preparation Management

Edit and add a dataset transformation rule
At the bottom of the screen, there is an area to edit or add a dataset transformation rule.

The following is a description of each command's function.
Command

Function

rename

Changes the column name

drop

Removes the column

delete

Deletes rows that meet the conditional expression

keep

Deletes rows except those that meet the conditional expression

set

Sets the result of the given conditional expression as a value of the column

derive

Similar with the above 'set', but leaves the original column and creates a new column

header

Sets column values in the designated row as the name of each column at once

replace

Applies a conversion equation to the column values at once

settype

Changes the type of column (Integer, Float, String, etc.)

extract

Finds the pattern and makes it a new column

split

Splits one column into multiple columns

merge

Merges multiple columns into one column

join

Transforms into a result joined with another dataset

union

Merges contents of different datasets that share the same schema

sort

Sorts by the selected column value

move

Moves a column up or down

aggregation

Creates a new column based on the result of the grouping operation

flatten

Creates each element as a new column when the arrangement column is designated
Creates columns whose name is each value of the designated column, and performs

pivot

the grouping operation

unpivot
countpattern

Designates column names as column values, and converts them into multiple rows
Creates new column based on the times of appearance of the given pattern from the
designated column
Designates multiple columns and makes an arrangement or map using their values to

nest

create a new column

unnest

Extracts a specific element from an arrangement or map to create a new column

Management

Data Preparation Management

3.2.5 Create a data snapshot
Click the ‘snapshot’ button in the Wrangled dataset edit screen to save a snapshot.

A. Data snapshot type: File
Select ‘File’ as the data snapshot type to save the created snapshot in the server as a file.

1
2
3
4

❶ Data snapshot type: Select the type of database where the data snapshot is saved.
❷ File format: Select the file format in which to save the data snapshot.
❸ Compression: Select the compression type to save the data snapshot.
❹ Full Data ETL engine: Select the engine to be used for data ETL (extraction,
transformation and loading).
 Embedded engine: Suitable for detecting small data.
 Spark: Suitable for detecting large data.

Management

Data Preparation Management

B. Data snapshot type: HIVE
Select ‘HIVE’ as the data snapshot type to save the created snapshot in metatron's internal
Hive as a database.

❶ Data snapshot type: Select the type of database where the data snapshot is saved.
❷ DB name: Select the database where the data snapshot is saved.
❸ Table name: Select the table name where the data snapshot is saved.
❹ File format: Select the file format in which to save the data snapshot. Select the file
format in which the recorded value is saved in Hive.

❺ Compression: Select the compression type to save the data snapshot.
❻ Overwrite method: Select how to save tables with the same name.
 Overwrite: Overwrites the snapshot in the existing table.
 Append: Add snapshot data to the existing table.

❼ Partition keys: Designate columns to use as partition keys:

Management

Data Preparation Management

❽ Full Data ETL engine: Select the engine to be used for data ETL (extraction,
transformation and loading).
 Embedded engine: Suitable for detecting small data.
 Spark: Suitable for detecting large data.

❾ Choose columns to use as partition keys: Select whether to use a partition key
function.

Management

Data Preparation Management

3.3 Data snapshot
Data snapshot is a function to apply the preparation rules defined in a Wrangled dataset to the
ingested data, and save its result. The saved data can be used as a data source for analysis and
visualization. In this way, the original data passes data preparation and is saved as an optimal
form for analysis.

3.3.1 Data snapshot management home
This screen displays the name and creation time of data snapshots, the name of a created
dataflow and Wrangled dataset, success/failure, elapsed time, etc.
1
3
2
4
5

❶ Status: Search the data snapshot by its success/failure status.
 All: Displays all data snapshots regardless of the success/failure of data snapping.
 Success: Displays snapshots where data snapping is successful.
 Failure: Displays snapshots where data snapping has failed.

❷ Search by name: Search a created data snapshot by its name.
❸ Number of data: Displays the number of data snapshots searched from the current list.
❹ Data snapshot list: Displays data snapshots that meet the established criteria. Click a
data source to see its details. (See Paragraph 3.1.2.)

❺ Delete: Hover the mouse over the data snapshot to display the trash icon. Click the icon
to delete the data snapshot.

Management

Data Preparation Management

3.3.2 Data snapshot details
Click a snapshot listed in the data snapshot management home to display information about
the data snapshot such as the Hive table name, number of lines and capacity, and to check the
actual contents of data as follows.
1
4
5
2

6
7
8

3
9
10
11

❶ Name: Dataset used by the snapshot.
❷ Success rate: Rate of records saved in the Hive table successfully.
 Valid: Rate of records saved in the Hive table successfully because all column formats
match.
 Mismatched: Rate of records failed to save in the Hive table because column formats
do not match.
 Missing: Rate of missing records

❸ Snapshot data history: Displays the history of data snapped in the snapshot.
❹ Database: Name of the Hive database where the snapshot is saved.
❺ Table: Name of the Hive table where the snapshot is saved.
❻ Summary: Number of rows (records) and columns of data snapped in the snapshot.
❼ Elapsed time: Elapsed time to snap the data.
❽ Created on: Time when the snapshot was created.
❾ Dataflow: Displays the dataflow used to create the snapshot. Click to search/edit the
dataflow.

❿ Dataset: Displays the Wrangled dataset used to create the snapshot. Click to search/edit
the dataset.

⓫ Origin imported dataset: Displays the Imported dataset (original data) of the snapshot.

Management

Data Preparation Management

4. Notebook Management
metatron Discovery provides functions to utilize external analysis tools such as Jupyter and Zeppelin
through a module called a notebook. To utilize these functions, it is necessary to link the server
where the external analysis tool is installed. This server is called the ‘notebook server’.

4.1 Notebook server
In this menu, a user can register a new notebook server or search/edit registered notebook servers.

4.1.1 Search notebook server list
Used to view a list of available notebook servers. A user can add a notebook server, and edit
or delete its connection information.

❶ Type: Search a registered notebook server by the linked external analysis tool
(Jupyter/Zeppelin).
 All: Displays all notebook servers regardless of the external analysis tool.
 Jupyter: Displays notebook servers linked to Jupyter.
 Zeppelin: Displays notebook servers linked to Zeppelin.

❷ Search by server name: Search a registered notebook server by its name.
❸ Number of data: Displays the number of data sources searched from the current list.
❹ Add a server: Click to create a new notebook server.
❺ Delete selections: Deletes notebook servers selected in the left check boxes from the
notebook server list.

❻ Notebook server list: Displays notebook servers that meet the established criteria. Click
a notebook server to edit its setting.
Management

Notebook Management

4.1.2 Register a new notebook server
Click the ‘Add a server’ button in the notebook management screen to display a pop-up window
to register a notebook server as follows.

1
3

❶ Type: Select the external analysis tool installed in the notebook server to be registered.
❷ Host: Enter a host value of the notebook server to be registered.
❸ Port: Enter the port number of the notebook server to be registered.
❹ Name: Enter the name of the notebook server to be registered (mandatory).
❺ Description: Enter the description of the notebook server to be registered.

Management

Notebook Management

5. Data Monitoring Management
Data monitoring is a function to monitor various data logs generated in metatron's Staging DB
(internal Hive DB) and workbench (external DB).

5.1 Log analysis
5.1.1 Log analysis home screen
This screen displays various statistics related to the performance of queries in metatron
Discovery. Click a query from the search results to move to the View detail screen of that query.
(See Paragraph 5.2.2.)

1
4

2
3

❶ Performance Start Time: Enter the period of the data log to be included in the query
analysis.

❷ Search User ID: Search by ID of the user who executed the query.
❸ Query success/failure rate: Displays the success/failure rate of queries performed in
metatron.

❹ Query frequency by user: Graph indicating the frequency of each user who performed
the query. Click a bar to view the Job Log performed by the user.

❺ In order of longest: Displays the performed queries in order of longest elapsed time.
Management

Data Monitoring Management

❻ Amount of scan data: Displays the performed queries in order of highest amount of
scanned data.

❼ Frequency of successful queries: Displays the performed queries in order of highest
frequency of success.

❽ Frequency of failed queries: Displays the performed queries in order of highest
frequency of failure.

❾ Total memory usage: Displays the performed queries in order of highest total memory
usage.

❿ Total CPU usage: Displays the performed queries in order of highest total CPU usage.
⓫ Resource usage by queue: Displays the amount of resources consumed in each YARN
queue of the Hadoop environment.

Management

Data Monitoring Management

5.2 Job log
Used to search the history of all queries performed in metatron.

5.2.1 Job log home
In this screen, a user can search the query history by applying custom search criteria.

3
4

❶ Status: Search queries performed by success/failure status.
❷ Limited Elapsed time: Search queries with long elapsed time. User can select the
standard time.

❸ Performance Start Time: The standard time for searching queries. This time is based on
the start time of each query's performance.

❹ Search by Job or application: Search remaining queries with the current history by
query statement or Application ID.

❺ Number of data: Displays the number of queries searched from the current list.
❻ Job list: Displays queries that meet the established sort criteria. Click a data source to
see its details. (See Paragraph 5.2.2.)

Management

Data Monitoring Management

5.2.2 Job log details
Click a query listed in the job log home to view various information and the history of that
query.

1
2
3
4

5
6

❶ Status: Displays the success/failure status of the query.
❷ Job name: Performed query statement.
❸ Start time: Time when the query was started.
❹ Elapsed time: Time taken to perform the query.
❺ User: ID of the user who performed the query.
❻ Connection: For a query performed in workbench, displays information of the relevant
data connection.

❼ Query History: For a query performed in workbench, the history of the latest 5 queries
performed in the database and their results. Click Detail to display the query statement
in a new window.

❽ Plan: Perform the query performance plan.

Management

Data Monitoring Management

5.3 Data lineage
In data lineage, a user can analyze a Hive log performed from metatron's internal Hive database
to check the flow of data ETL (extraction, transformation and loading). Workflow is performed
through query statement, so the data lineage tracking function is based on the query log.

5.3.1 Data lineage management home screen

❶ Search: Input a keyword to search data. The keyword is determined by the entity type
selected on the right.

❷ Entity type: Select entity type to track workflow.
 Table: Displays tables related to the performed workflow.
 Column: Displays columns related to the performed workflow.
 SQL: Displays the SQL query statement related to the performed workflow.
 Workflow: Displays performed workflows.

❸ Entity list: Displays entities that meet the established sort criteria. Click a data source to
see its details. (See Paragraph 5.3.2.)

Management

Data Monitoring Management

5.3.2 Data lineage details
Click an entity listed in the data lineage management home to view information of the workflow
linked to that entity.
4
1

❶ Original data table box: Displays the name of the table containing the original data
used to perform each query in the workflow. Click the

button to search columns

stated in the query statement. Select the box to search information on the relevant
table, and select the column to color the linked query and columns of other tables.

❷ SQL query list box: Displays the SQL query statement performed for the workflow. Click
the

button to search the contents of the clause that defines the relation between the

original data column and result data column. Select box to search information on the
relevant SQL, and select column to color the linked query and linked columns.

❸ Result data table box: Displays the name of the table containing the result data of
workflow queries. Click the

button to search the columns stated in the query

statement. Select the box to search information on the relevant table, and select the
column to color the linked query and columns of other tables.

❹ View details: Click a box in workflow to display the details of the table or SQL on the
right of the screen. Select the table box to search the meta data of the relevant table,
and select the SQL box to search information on the query statement.

❺ Mini-map: Used to zoom in/zoom out/move the screen.

Management

Data Monitoring Management

Part 3

Using Workspace

The content of this document must not be copied,
distributed or used in part or in whole without the prior approval of SK Telecom.

Contents
1. Overview of Workspace ........................................................... 3
2. Workspace Management Home.............................................. 4
2.1 Composition of the workspace management home screen .................... 4
2.2 Folder item..................................................................................................................... 6
2.3 Entity item ...................................................................................................................... 6
2.4 Copy/Move/Delete folder and entity ................................................................. 7

3. Shared Workspace List.............................................................. 8
4. Create a Shared Workspace ..................................................... 9
5. Set Access Permission for a Shared Workspace ................ 10
5.1 Permission schema ...................................................................................................10
5.1.1 Search permission schema .......................................................................................... 10
5.1.2 Change permission schema setting ........................................................................ 11

5.2 Set shared member & group ..............................................................................12

Using Workspace

Contents

1. Overview of Workspace

Workspace is a space to store metatron Discovery analysis modules such as workbook, notebook
and workbench. Workspace is divided into a personal workspace and a shared space.
 Personal workspace: A private workspace assigned to each Discovery member. Only the
member can access his/her personal workspace.
 Shared workspace: A public workspace for multiple users. This space is used to share the
process and results of analysis with other users. An owner or administrator of a shared
workspace can grant various levels of access to Discovery members.

This part is divided as follows.
Chapter 2 Workspace management home: Describes the composition and UI of the
workspace home screen.
Chapter 3 Shared workspace list: ‘Shared workspace list’ is a page that lists a user's
accessible shared workspaces. The user can filter and search for a desired work list.
Chapter 4 Create a shared workspace: Describes how to create a new shared workspace.
Chapter 5 Set access permission for a shared workspace: To collaborate in a shared
workspace, each user must have various roles and permissions. This menu is used to set
the permission level of each role and assign these permission levels to each user.
Using Workspace

Overview of Workspace

2. Workspace Management Home
In this screen, a user can perform the management functions in metatron Discovery's analysis
modules (workbook, notebook and workbench).

2.1 Composition of the workspace management home
screen
The following is a description of the overall composition of the workspace management home.
1
2

5
4

❶ Main menu button: Click this button to open a panel to access another workspace.
❷ Workspace information: Displays the name and description of the workspace. If the
logged-in user is the owner of the workspace, an 'Owner' icon will be displayed next to
the name of the workspace.

❸ Status of registered entities: Displays the number of entities registered in the
workspace by entity type.

❹ Data source: Displays the number of data sources used in the workspace. Click this area
to show a list of these data sources.

❺ Workspace List: Click this button to show a list of shared workspaces. (See Chapter 3
for more information.)

❻ Creation information: Displays the creation date and creator of the workspace.

Using Workspace

Workspace Management Home

❼ More: Displays the creation date and creator of the workspace.
 Edit the name and description: Edit the name and description of the workspace.
 Set shared member & group: Set the user and group who can access the workspace.
(See Paragraph 5.2 for more information.)
 Set notebook server: Set the access information for the external analysis tool used in
the notebook module.
 Set permission schema: Set the access permission of each user role for the workspace.
(See Paragraph 5.1 for more information.)
 Change owner: Change the owner of the workspace.
 Delete workspace: Delete the workspace.

❽ Workspace path: Displays the current location in the workspace. Click on an upper
folder listed in the path to move to that folder.

❾ Create a folder: Click to create a new folder in the current location.
❿ Sort/align entity list:
 Search: Search for an entity or folder in the workspace by name.
 Entity type combobox: Search for only a specific entity type in workbook, notebook
and workbench.
 Sort: Sort folders or entities by name or update time.
 View format: Select grid view or list view as a format to list entities in the workspace.

⓫ Folder list: Displays folders in the current location that meet search criteria. Click a
folder to move to that folder. (For more information on each folder, see Paragraph 2.2.)

⓬ Entity list: Displays entities in the current location that meet search or sort criteria. Click
an entity to move to the home screen of that entity. (For more information on each
entity, see Paragraph 2.3.)

⓭ Select/copy/move/delete entity: Select all entities, or copy, move or delete an entity.
(See Paragraph 2.4 for more information.)

⓮ Create an entity: Buttons used to create a specific type of entity in the workspace. (See
Chapter 2 of Part 4, Chapter 3 of Part 5, and Chapter 2 of Part 6, respectively, for more
information on how to create.)

Using Workspace

Workspace Management Home

2.2 Folder item
Each folder item is displayed as follows.
1

❶ Check box: Used to select the folder. The user can clone, move or delete the selected
folder.

❷ Name: Name of the folder.
❸ Edit: Click to edit the name of the folder. To display this button, you must hover the
mouse over the folder item.

❹ Delete: Click to delete the folder. To display this button, you must hover the mouse
over the folder item.

2.3 Entity item
Each entity item is displayed as follows.
1

2
3

❶ Check box: Used to select the entity. The user can clone, move or delete the selected
entity.

❷ Entity type: Displays the type of entity (workbook/notebook/workbench).
❸ Delete: Click to delete the entity. To display this button, you must hover the mouse
over the entity item.

❹ Name: Name of the entity.
❺ Update time: Displays the time of the latest entity update.
❻ Number of data sources/dashboards: This is an exclusive area for the workbook entity.
 The number next to the

icon is the number of data sources linked to the

workbook.
 The number next to the

icon is the number of dashboards registered to the

workbook.

Using Workspace

Workspace Management Home

2.4 Copy/Move/Delete folder and entity
A user can copy, move or delete folders and entities in the workspace. Select a folder or entity to
be copied, moved or deleted. Operation buttons will be activated in the lower-left corner of the
workspace home screen.

❶ Select all: Select all items in the current folder list and entity list.
❷ Clone Workbook: This is an exclusive function for the workbook. Click this button to
clone the selected workbook.

❸ Move selections: Move the selected folder or entity. In the case of workbook, a user
can move it to another workspace. For other items, the user can move it to another
folder in the same workspace. However, it is impossible to move it when the workbook
is selected with another entity.

❹ Delete: Delete the selected folder or entity.

Using Workspace

Workspace Management Home

3. Shared Workspace List
The shared workspace list screen is used to view the list of all shared workspaces accessible by the
logged-in user and to move to a specific workspace. This screen can be accessed via two methods.
 Click the

button at the top-left of the Discovery screen to open the main panel, and click

‘Workspace list >>’.
 Click ‘Workspace List’ at the top-right of the workspace home screen.

The shared workspace list screen is composed as follows.
1

❶ Number of shared workspaces: Displays the number of shared workspaces in the list.
❷ Add a shared workspace: Click this button to move to the screen to add a shared
workspace. (See Chapter 4 for more information on how to add.)

❸ Move to a personal workspace: Click this button to move to the personal workspace of
the logged-in user.

❹ Search: Search a shared workspace by its name.
❺ Favorites: Sort only workspaces designated as favorites.
❻ Anonymous only: Sort only workspaces set as public.
❼ I'm the manager: Displays a list of workspaces where the logged-in user is the
administrator.

❽ Name ascending/descending: Sort names of a shared workspace in
ascending/descending order.

❾ Workspace list: Displays workspaces that meet the established sort criteria. Click a
workspace to move to that workspace.
Using Workspace

Shared Workspace List

4. Create a Shared Workspace
A new shared workspace is created as follows.

1. Click the

button in the shared workspace list to open the screen to create a new

shared workspace.
2. Refer to the descriptions below and fill in the entries.

❶ Name: Enter a name for the shared workspace.
❷ Description: Enter a description of the shared workspace.
❸ Workspace permission schema: Set the permission schema for each role of the shared
workspace.
 Use a schema preset: Import the permission schema defined by the administrator.
 Use a custom schema: Define a new permission schema. (For more information on
how to define a new permission schema, see Paragraph 5.1.1.)
3. Click ‘Done’ to finish creating a workspace.

Using Workspace

Create a Shared Workspace

5. Set Access Permission for a Shared Workspace
Setting the access permission for a shared workspace is composed of the following two steps.
 Set the access permission for each user role (See Paragraph 5.1 ‘Set permission schema’.)
 Grant a suitable user role for each user or user group (See Paragraph 5.2 ‘Set shared
member & group’.)

5.1 Permission schema
5.1.1 Search permission schema
Click the

icon at the top-right of the shared workspace home screen and click ‘Set permission

schema’ to view the defined permission schema as follows.

In the above example, Manager, Editor and Watcher are defined as user roles. As shown in this
example, ‘permission schema’ is a group of user roles where an individual’s access permission
is defined to each role.
Attributes of each column for each user role are as follows.
Default role
For the new user or user group, the user role designated as a default role is granted.
Permission for each type of workbook/notebook/workbench entity
 View: A user can view data by accessing the entity of the relevant type.
 Create: A user can create, edit or delete entities of the relevant type.
 Edit any: A user can edit or delete entities of the relevant type created by another
user.
Workspace permission
 Create folders: A user can create, edit or delete folders in the workspace.
 Set config.: A user can edit the name and description of the workspace, and change
the workspace permission schema.

Using Workspace

Set Access Permission for a Shared Workspace

5.1.2 Change permission schema setting
Click the ‘Change schema’ button in the permission schema search screen to display the screen
to change the defined permission schema as follows.

Click ‘Select Role Set’ combobox on the right to display schemata defined by the administrator.
Use ‘Custom RoleSet’ at the bottom of the list to set a new user role. Select one to display the
following screen. (If you select ‘Custom RoleSet’, you must define the permission for each user
role first. Click the

button to move to the permission setting screen, and set the permission

by user role referencing the descriptions in Paragraph 5.1.1.)

Current

New permission

permission

schema

In this example, each user role of the current permission schema is substituted with the user
role defined in the new permission schema. Hover the mouse over the

icon next to the

name of each user role to display the permission assigned to the user role.
Click ‘Done’ to finish setting the permission schema.

Using Workspace

Set Access Permission for a Shared Workspace

5.2 Set shared member & group
Click the

icon at the top-right of the shared workspace home screen, and click ‘Set shared

member & group’ to display the screen to set a shared member and group as follows. In this
example, each user role defined in the permission schema is assigned to each user or user group.
Assign the user role referencing the following description, and click ‘Done’ to finish setting
workspace access permission.

2
3

❶ Select unit of user role assignment
 Member tab: Assign a user role to each user.
 Group tab: Assign a user role to each user group. (A user group can be granted
administrator permission.)

❷ User roles: Click to display the information for the permission schema (the definition of
permission by user role) as a pop-up window.

❸ Member/group list: Lists users (groups in the case of the group tab) registered in
Discovery. Click a user (group) in the list to add it to the role assignment area on the
right. Click an added user (group) to remove it from the area on the right.

❹ Assign user role: Click this combobox to display user roles defined in the active
permission schema. Select a role to be assigned to the user (group).

Using Workspace

Set Access Permission for a Shared Workspace

Part 4

Using Workbook

The content of this document must not be copied,
distributed or used in part or in whole without the prior approval of SK Telecom.

Contents
1. Workbook Outline ..................................................................... 3
2. Create a Workbook ................................................................... 4
3. Create a Dashboard ................................................................... 6
4. Manage a Workbook .............................................................. 11
4.1 Dashboard list.............................................................................................................12
4.2 Dashboard detail view ............................................................................................14
4.2.1 Dashboard - Basic screen ............................................................................................ 14
4.2.2 Dashboard - data source information dialog box ............................................ 17
4.2.3 Dashboard - presentation view screen .................................................................. 19
4.2.4 Dashboard - Edit screen ............................................................................................... 20

5. Create/Manage Chart .............................................................. 28
5.1 Overview of the chart home screen .................................................................28
5.2 Data column list ........................................................................................................29
5.2.1 Composition of data column list .............................................................................. 29
5.2.2 Add a custom column ................................................................................................... 30
5.2.3 Dimension and measure............................................................................................... 31

5.3 Pivoting..........................................................................................................................32
5.3.1 What is ‘pivoting’? ........................................................................................................... 32
5.3.2 The concept of column/row/cross shelves .......................................................... 33

5.4 Chart type ....................................................................................................................35
5.5 Chart filter ....................................................................................................................39
5.5.1 Filters included automatically..................................................................................... 39
5.5.2 Chart filter panel .............................................................................................................. 40
5.5.3 Chart filter dialog box ................................................................................................... 41

5.6 Chart style setting ....................................................................................................46
5.6.1 Chart style setting menu .............................................................................................. 46
5.6.2 'Common Setting' items by chart type .................................................................. 49
Using Workbook

Contents

1. Workbook Outline

Workbook is the visual data analysis module based on the metatron Discovery engine, Druid. The
main features are as follows.
 It provides fast and flexible data analysis using a time series-based multi-dimensional data
source.
 Each dashboard contains visualization widgets for various charts and texts, so users can
utilize it as a report for presentations.
 Frequently used algorithms such as clustering, prediction line, or trend line can be
implemented as GUI.

This part is divided as follows.
Chapter 2 Create a Workbook: ‘Workbook’ functions as an independent report. This chapter
describes how to create a new workbook.
Chapter 3 Create a Dashboard: Dashboard means each slide of workbook. This chapter
describes how to create a new dashboard in workbook, and to link that dashboard to a
data source.
Chapter 4 Manage a Workbook: This chapter describes how to sort a list of dashboards that
compose a workbook and to configure various widgets in each dashboard.
Chapter 5 Create/manage a Chart: A chart is another widget that makes up the dashboard,
but it has many unique concepts and UI menus, so it is necessary to describe only the
chart in this part.
Using Workbook

Workbook Outline

2. Create a Workbook
In metatron Discovery, ‘workbook’ functions as an independent data analysis report. Once a
workbook is created, a user can store many ‘dashboards’ in the workbook and present them in the
proper order.
A workbook is created as follows.

1. Click the '+ Workbook' button at the bottom of the workspace. The screen used to create
a workbook will be displayed.

2. Enter a name (mandatory) and description of the workbook to be created and click 'Done'.
With the ‘Continue to create a dashboard of a new workbook’ checkbox selected, the
creation of a workbook proceeds directly to ‘Create a Dashboard’ page. This page is
required because a workbook cannot work without dashboards in it.

Using Workbook

Create a Workbook

3. For more information on how to create a dashboard, see Chapter 3.

4. A user can check the new workbook in the workspace screen as follows. Click that
workbook to see a screen where you can use the screen.

Using Workbook

Create a Workbook

3. Create a Dashboard
A dashboard is contained in a workbook, and provides functions to analyze and visualize a specific
data source based on the customer's needs. Therefore, an important step to create a dashboard is
linking a data source.
A dashboard is created as follows.

1. From the list of data sources disclosed in the workspace, select a data source to be linked
to the dashboard. Only one root data source can be selected, and the user can select
another data source to be joined to the subsequent screen.

❶ Search by data source name: Search a data source allowed in the relevant workspace
by its name.

❷ View only open data: Display only a data source designated as an ‘open data source’.
❸ Type: Display connection type or collection type data sources.
❹ Data source list: Displays data sources that meet the established criteria.
❺ Data source information: Displays an overview of the data source selected in the list.

Using Workbook

Create a Dashboard

2. Refer to the description below and add other data sources to be joined to the top data
source selected above. Click 'Next' if you get the result you want.
Basic screen

❶ Data source tree: Displays join relationships between data sources as a tree structure.


: Edit the join relationship.



: Edit the join relationship. (Same function of the



: Delete the join relationship.

icon)

 Click ‘+ Add a datasource to join’ to display a dialog box to join a new data source.
(For more information on this dialog box, see the description on the next page.)

❷ Result data: Displays the results table of joining a data source.

Using Workbook

Create a Dashboard

Data join dialog box

❶ Master data source: Displays information on the master data source of the new data
source to be joined.

❷ Data source to be joined: Select a data source to be joined to the master data source.
❸ Add a join key: ‘Join key’ is a key to define the join relationship between the master
data source and the data source to be joined. Select a column to be joined from each
data source, and click this button to add a new join key. In this case, the data type
defined in the column of each data source must be identical.

❹ Join type: Select how to join and transform a data source. To help you understand,
each join type is explained using the following tables as an example.
Master data source
Product name
(join key)

Data source to be joined
Product name
Sales
(join key)

Price

$22.11

100

$9.23

200

$8.99

$10.10

Using Workbook

Create a Dashboard

 Inner: Joins records with a join key present in both the master data source and the
data source and includes them in the resulting table. (Intersection of two data
sources)
Product name
(join key)

Price

Sales

$9.23

100

$10.10

200

 Left: Imports data from the right data source (data source to be joined) based on the
data value inside the join key column of the left data source (master data source to
join), and shows it in the results table. Among the records in the right data source,
records whose join key value are not present in the left data source are discarded.
Product name
(join key)

Price

Sales

$22.11

$9.23

$8.99

$10.10

null
100
null
200

 Right: Imports data from the left data source (master data source to join) based on
the data value inside the join key column of the right data source (data source to be
joined), and shows it in the results table. Among the records in the left data source,
records whose join key value are not present in the right data source are discarded.
Product name
(join key)

Price

Sales

$9.23

100

$10.10

200

null

 Full Outer: Joins by importing all data of both data sources based on the data value
inside the join key column and includes them in the resulting table. (Union of two
data sources)
Product name
(join key)

Price

Sales

$22.11

$9.23

$8.99

$10.10

null
100
null

null

200
50

❺ Preview results: Displays the result value of data source joining.

Using Workbook

Create a Dashboard

3. Confirm information on the data source imported to create a dashboard, enter a name and
description, and click ‘Done’ to create a new dashboard.

4. The new dashboard will be added to the home screen of the workbook. Click this to
display the screen of the relevant dashboard.

Using Workbook

Create a Dashboard

4. Manage a Workbook
Select a workbook from the list on the workspace screen to display the home screen of the
workbook. On the workbook home screen, the user can view, create, and edit the dashboard. In
addition, the user can display various dashboards in slide show mode via presentation mode. The
home screen of the workbook is divided into the following two areas.

 Dashboard list: Used to configure the workbook and manage dashboard lists. (See
Paragraph 4.1.)
 Dashboard detail view Displays detailed information on the dashboard selected in
dashboard list area. In this area, a user can analyze, visualize and sort data through chart,
text and filter widgets. (See Paragraph 4.2.)

Using Workbook

Manage a Workbook

4.1 Dashboard list
The dashboard list area can be switched into ‘Dashboard Mode’ or ‘Comment Mode’. Each mode
provides the following function.
 Dashboard: The user can add a new dashboard or list the registered dashboards.
 Comment: The user can add a new comment or list the registered dashboards. Comments
can be written and viewed by all users who have access to the workbook.
Dashboard/comment joint area
1
2

❶ Name: Name of the workbook.
❷ More: Click to edit the name/description of the workbook or delete the workbook.
Also, it is possible to check the time of the latest update and creation of the workbook.

❸ Select dashboard or comment mode: Select dashboard or comment mode as the
mode to be displayed. The number in parentheses next to each selection indicates the
number of dashboards/comments registered in the workbook.

❹ Fold: Click this button to collapse or expand the dashboard list.
Dashboard mode

❶ View format: Select format to display a
2

dashboard list. The user can select Thumbnail

view or List view.

❷ Data source: Used to check data sources

used in the active workbook. Check or
uncheck the check box of each data source to
search only dashboards using a specific data
source.

❸ Shown only: Search only dashboards set as
Shown.

❹ Search Formula: Search a dashboard by its
name.

❺ Dashboard list: Displays the dashboard list
that meets search criteria. Hover the mouse
over a dashboard to copy or delete that

dashboard.
Using Workbook

Manage a Workbook

❻ Create dashboard: Click this button to display
the screen to create a new dashboard. (See
Chapter 3 for more information on how to
create it.)
Comment mode

❶ Comment list: Displays comments registered

in the workbook in order of latest registration
date.

❷ Add a Comment: Create a new comment.
Press Enter to register the entered comment.
Press Shift + Enter to insert the line break.

Using Workbook

Manage a Workbook

4.2 Dashboard detail view
The dashboard detail view area displays detailed information on the dashboard selected in
dashboard list area. In this area, a user can analyze, visualize and sort data through chart, text
and filter widgets.

Dashboard - Basic screen
In the basic screen of the dashboard, a user can view charts registered in the dashboard and
move to various screens to configure the dashboard.

A. Overall composition of the dashboard basic screen
The following is a description of the overall composition of the basic screen of the dashboard.
1

❶ Name and description: Name and description of the dashboard. Hover the mouse over
to edit.

❷ Update information: Displays the user name and time of the latest dashboard update.
❸ Data source: Click to see information on the data source used in the dashboard as well
as the related statistics/schema view. (For detailed information, see Paragraph 2.1.3 A)

❹ Presentation view: Used to view the workbook with the appropriate UI for presentation.
Click it to display a slide corresponding to the active dashboard. Also, the user can
move to another dashboard. (See Paragraph 4.2.3.)

❺ Edit dashboard: Click the button to display a screen to edit widgets on the dashboard.
(See Paragraph 4.2.4.)

❻ More: Click it to clone or delete the dashboard. Also, the user can check information on
the recent modification and creation of the dashboard.

❼ Add a widget: The menu to add a widget in the dashboard. Currently chart, text and
filter widgets are available. (See Paragraph 4.2.4 D)

❽ Widget layout area: Displays widgets configured to be shown in the layout screen of
the dashboard.
Using Workbook

Manage a Workbook

B. Chart widget box in the dashboard basic screen
The following is a description of the chart widget box in the widget area. Hover the mouse
over the box to display the widget's settings icons at the top right of the screen.
1

❶ Select a data area: Select how to choose a data item in a chart graph with the mouse
cursor. Select a specific data item(s) to filter and display all charts in the dashboard
based on the relevant dimension category.

❷ Zoom in/out for chart: Used to zoom in/out of the chart screen. Press the

button to

reset.

❸ Chart info: Displays pivoted data information when creating the chart.
❹ Save data table: Save the chart's data information as a local file.
❺ Save chart image: Save an image of the chart as a jpg file.
❻ Edit: Click to display a dialog box to edit the widget.
❼ Expand to full screen: Expands the chart to the entire dashboard detail view area.
❽ Chart graph area: Area to display the relevant chart graph. Select a specific data item(s)
to filter and display all charts in the dashboard based on the relevant dimension
category.

❾ Chart mini-map: Displays a value distribution map of the chart by data category. Apply
a filter to reduce the scope of the mini-map.
Using Workbook

Manage a Workbook

C. Text widget box in the dashboard basic screen
The following is a description of the text widget box in the widget area. Hover the mouse
over the box to display icons to edit the widget at the top right of the screen.
1

❶ Text area: Area to display the written text.
❷ Edit: Click to display a dialog box to edit the widget.

D. Filter the widget box in the dashboard basic screen
The following is a description of the filter widget box in the widget area. Hover the mouse
over the box to display icons to edit the widget at the top right of the screen.
2

❶ Filter area: Area to display the configured filter. A user can edit the scope of the filter.
❷ Edit: Click to display a dialog box to edit the details of the widget.

Using Workbook

Manage a Workbook

Dashboard - data source information dialog box
Click the

button in the dashboard basic screen to display a dialog box displaying information

of the data source used in the dashboard. This dialog box is composed of 3 tabs (Data grid,
Column detail, Dashboard data information).
Data grid tab
Displays all record values of the data source.

Column detail tab
Displays detailed information in each column of the data source.

Using Workbook

Manage a Workbook

Dashboard data information tab
Displays an overview of the data source.

Using Workbook

Manage a Workbook

Dashboard - presentation view screen
Click ‘Presentation View’ in the dashboard basic screen to view workbook dashboards with the
appropriate UI for presentation. With this UI, a user can easily report and share data analysis
results.
1

❶ Name: Name of the current dashboard.
❷ Slide navigation: Each circle represents each dashboard in the workbook. For example,
if the user clicks the 4th circle, the 4th dashboard slide will be displayed and that circle
will be highlighted.

❸ Auto slide show setting: Select the time interval and click PLAY to start the auto slide
show. Slides will be changed in the selected time interval.

❹ Exit: Close the presentation view and return to the workbook/dashboard basic screen.

Using Workbook

Manage a Workbook

Dashboard - Edit screen
Click ‘Edit Dashboard’ in the dashboard basic screen to move to a screen to edit the composition
of the dashboard. A user can add a widget, edit the dashboard, set the hierarchy and change
the layout.

A. Overall composition of the dashboard edit screen
2

❶ Name: Displays the name of the dashboard.
❷ Add widget menu: The menu to add a widget such as chart, text and filter widgets.
❸ Dismiss: Click to exit the current screen without saving changes.
❹ Done: Click to save changes and exit the current screen.
❺ Widget layout area: Edits the arrangement, display and attributes of widgets configured
to be shown in the layout of the dashboard. (See Paragraph B and C for more
information.)

❻ Panel area: Used to add/edit/delete various widgets and edit settings related to how to
display the dashboard. (See Paragraph D for more information.)

Using Workbook

Manage a Workbook

B. Widget arrangement setting
4

❶ Change widget location: Drag the title of a widget to change the location of the
widget.

❷ Adjust widget width: Move the distance between widgets to adjust their width.
❸ Add a widget to the screen: Drag a widget from the widget list in the right panel to
the left widget layout area. The widget will be added to the layout area.

❹ Delete a widget from the screen: Click the

button in a widget shown in the widget

layout area. That widget will be deleted from the layout area.

Using Workbook

Manage a Workbook

C. Individual widget edit area
1 2 3

5 6

❶ Widget name: Show or hide the name of a widget.
❷ Legend: Show or hide the legend.
❸ Mini-map: Show or hide the mini-map.
❹ Chart info: Displays pivoted data information when creating the chart.
❺ Edit: Click to display a dialog box to edit the widget.
❻ Copy: Click to clone the chart and display it in the screen.
❼ Delete: Click to remove the chart from the dashboard screen.

Using Workbook

Manage a Workbook

D. Panel area in dashboard edit screen
Chart widget panel
In the chart widget panel, a user can add/edit/delete a chart in the dashboard.
2

❶ Number of chart widgets: Displays the number of chart widgets registered in the active
dashboard.

❷ Add a chart widget: Used to create a new chart widget in the dashboard. (See Chapter
5 for more information on how to add it.)

❸ Chart widget list: Lists chart widgets registered in the active dashboard. Hover the
mouse over a widget to edit or delete it. The relevant icon will be displayed. Drag a
widget to the widget layout area to display the widget in the layout area.

❹ Set chart hierarchy: Used to set parent/child relationships of charts in the dashboard.
Select a data item from the parent chart. The child chart will be filtered by that item. To
set the hierarchy, drag a chart to be set as ‘child’ under its parent chart. Once the chart
hierarchy is set, the user can check the modified structure in the chart menu.

Using Workbook

Manage a Workbook

Text widget panel
In the text widget panel, a user can add/edit/delete a text widget in the dashboard.
1

❶ Number of text widgets: Displays the number of text widgets registered in the active
dashboard.

❷ Add a text widget: Used to create a new text widget in the dashboard.
❸ Text widget list: Lists text widgets registered in the active dashboard. Hover the mouse
over a widget to edit or delete it. The relevant icon will be displayed. Drag a widget to
the widget layout area to display the widget in the layout area.

Using Workbook

Manage a Workbook

Filter widget panel
In the filter widget panel, a user can add/edit/delete a filter widget in the dashboard.
1

❶ Number of filter widgets: Displays the number of filter widgets registered in the active
dashboard.

❷ Add a filter widget: Used to create a new filter widget in the dashboard.
❸ Filter widget list: Lists filter widgets registered in the active dashboard. Hover the
mouse over a widget to edit or delete it. The relevant icon will be displayed. Drag a
widget to the widget layout area to display the widget in the layout area.
This panel has the same functions and structure of the chart filter panel in the chart home.
For more detailed information, see Paragraph 5.5.2. Any filter created here is applied with the
‘global’ attribute, which makes the filter applied to all charts.

Using Workbook

Manage a Workbook

Layout panel
In the layout panel, a user can adjust some of the settings on how to arrange widgets and
display an individual widget in the widget layout area.

3
4
5

❶ Set board height
 Fix to screen: Match the height of the board to the screen.
 Fix to height: Set the height of the board as a specific pixel value.

❷ Margin between widgets: Used to set the margin between widgets shown in the
widget layout area.

❸ Chart title: Used to set the title display of all chart and filter widgets in the widget
layout area.

❹ Legend: Used to set the legend display of all chart widgets in the widget layout area.
❺ Mini-map: Used to set the mini-map display of all chart widgets in the widget layout
area.

Using Workbook

Manage a Workbook

Data source panel
In the data source panel, a user can view/edit information of the linked data source, as well
as conveniently add/delete a column filter.

This panel has the same functions and structure of the data column list in the chart home.
For more detailed information, see Paragraph 5.2. However, remember that the set/canceled
filter in this panel is a dashboard filter, and the set/canceled filter in the chart home is a chart
filter.

Using Workbook

Manage a Workbook

5. Create/Manage Chart
Each dashboard in the workbook is fundamentally composed of various charts that visualize the
analyzed data. This section describes some of the concepts that you need to know to create a
chart for data analysis, as well as for the UI to configure charts in Discovery.

5.1 Overview of the chart home screen
The home screen of the chart is divided into the following three areas.

❶ Column/Chart selection area: The UI is set up in order of actions required to create a
chart. A user can pivot a chart by selecting Data (data column list), and visualize data by
selecting Chart (chart type list). In addition, the user can load analysis conditions to a
chart using Analytics.

❷ Visualization area: This area is composed of the pivotable shelf area and the
visualization area where the actual chart is drawn. Once data and chart are selected in
the column/chart selection area, the chart is drawn in this area.

❸ Option area: Used to customize the appearance and display of a chart. The option area
is composed of filter, palette, axis, numeric expression and chart expression.

Using Workbook

Create/Manage Chart

5.2 Data column list
Composition of data column list
In the data column list, a user can view/edit linked data source information, as well as
conveniently add/delete a column filter.

❶ Search by column name: Search for a column in the data source by its name.
❷ Add custom column: Click to open the dialog box to create a new column by
combining/processing columns in the data source. The added custom column can be
used in the entire area of the dashboard.

❸ Set/cancel filter: Hover the mouse over a column to display this button. Click to set the
column as a chart filter, and click again to cancel the chart filter. In columns set as a
filter, the

icon is displayed regardless of mouseover.

❹ More: Used to check additional information on the column and set an alias.
 Column detail: Used to check the overview and data values of the column.
 Alias: Used to set a column alias. A formal column name can contain only
alphanumeric characters and some special characters, and space is not allowed.
Therefore, register a unique alias for more convenient analysis. The alias is applied to
the entire area of the dashboard.
 Value alias: An alias can be set to each data value in the column. The alias is applied
to the entire area of the dashboard.

Using Workbook

Create/Manage Chart

Add a custom column
Click the

button in the data source column list to open the dialog box to add a custom

column. A user can create a new column necessary to create a chart by applying various
formulas to the existing columns in the data source.

1
2

❶ Column name: Place to write the name of a custom column.
❷ Code area: Place to write the code to create a custom column. Click an item from the
following column and formula list. The item will be typed in this area automatically.

❸ Add column: A list of the existing columns in the data source. Click a column in the list.
The column will be typed in the code area automatically.

❹ Add formula: A list of formulas supported by metatron. Click a formula in the list. The
formula will be typed in the code area automatically, and the typing cursor will be
automatically moved to the parameter input field. For more information on purpose,
instructions for use and example of each formula, see the help text on the right of
screen.
Using Workbook

Create/Manage Chart

Dimension and measure
Columns of a data source linked to the dashboard are divided into a dimension column and a
measure column as follows. To fully utilize the chart function of Discovery, you must clearly
understand the concepts of dimension and measure.

Dimension column
A categorical data column with the following characteristics.
 A categorical (not aggregated) data field (e.g.: Category, Region, Organization, etc.)
 Criteria to display measure.
Measure column
A quantitative data field with the following characteristics.
 A field containing aggregated or quantitative information (e.g.: Sales, etc.)
 Data displayed in the chart based on the criteria provided by dimension

Using Workbook

Create/Manage Chart

5.3 Pivoting
What is ‘pivoting’?
‘Pivoting’ or 'shelfing' means a procedure to select a column in the column/chart selection area
and place it on the column/row/cross shelf in the shelf area.

The above figure displays pivoting of two dimension columns in the column shelf, and pivoting
of one measure column in the cross shelf. The chart displays the data of pivoting columns.
Mandatory/Recommended column types of each shelf are different by chart type. Select the
chart type before pivoting to show the necessary column types for the shelf.

Using Workbook

Create/Manage Chart

The concept of column/row/cross shelves
Think of the structure of Excel to understand the concept of column/row/cross shelves. As
shown below, a column/row defines the block, and cross defines the value to be entered in the
block.

metatron's column/row/cross does similar things. Excel displays column/row/cross in a twodimensional grid, but metatron displays column/row/cross in a three-dimensional cube. As a
tool of OLAP Data Discovery, metatron searches data in a three-dimensional space via OLAP
Cube. (For more information on OLAP, see metatron Outline in Part 1.) The following chart is
an axis figure of column/row/cross values expressed as a three-dimensional cube in metatron.

Using Workbook

Create/Manage Chart

We can assume that if values of an Excel grid are displayed in a three-dimensional chart, multiple
cross values will create various bars. However, metatron displays a chart as a two-dimensional
cross section, so bars will be stacked based on columns and rows. As a result, metatron will
display the two-dimensional chart in a similar way to the gray area in the bottom figure.

Using Workbook

Create/Manage Chart

5.4 Chart type
metatron Discovery provides more than 20 chart types. If the user pivots a column before selecting
a chart, suitable charts will be highlighted in purple.

The following table summarizes creation conditions and use attribute/type/example for each chart.
Chart

Creation

Name/Icon

Condition

Bar chart

Table

Use Attribute

Use Type

Use Example

Column: 1 or

Compares the

Used to compare

Comparison of

value of each

groups or view trends

sales and profits

dimensions &

item.

over time. Very effective

by product

cross: 1 or

when the trend is

more measures

fluctuating.

Column or

Displays the

Used to view measures

Detailed sales

row: 1 or more

cross data of

that meet certain

data by year

dimensions &

each item as

criteria. Useful to check

cross: 1 or

text.

detailed data and

more measures

accurate values. Not
good for visualization.

Line chart

Scatter chart

Column: 1 or

Displays

Used to view trends

Monthly sales

change of data

over time. If the trend is

trend

dimensions &

over time.

not fluctuating, a line

cross: 1 or

chart is more effective

more measures

than a bar chart.

Column: 1

Displays the

Used to define the

The relation

measure &

relation

relation between two

between

row: 1 measure

between

variables.

product sales

& cross: 1 or

various items.

and profit

more
dimensions

Using Workbook

Create/Manage Chart

Chart

Creation

Name/Icon

Condition

Heatmap

Use Attribute

Use Type

Use Example

Column or

Displays the

Used to compare two

Sales of each

row: 1 or more

cross data of

variables intuitively

product by

dimensions &

each item as a

based on color and size.

region

cross: 1 or

color

Used to emphasize the

more measures

distribution.

visual elements of a
table chart.

Cross: 1 or

Ratio of each

Used to compare

Comparison of a

item over the

components of

web browsers'

dimensions, 1

whole

something.

market share

Column: 1

Displays

Used to quickly deliver

Number of

dimension in

characteristics

information on the

customers

time attribute

values for

current organizational

introduced in

& cross: 1 or

processing

performance.

this year or

more measures

status.

or more
Pie chart

Control chart

measures

organizational
performance
index

Cross: 1 or

Displays main

Used to compare the

Comparison of

more measures

indicators with

distribution of each

the distribution

trends.

group or indicate the

of delay time by

target of a specific

airplane type

Key indicators

Boxplot

value.
Column: 1 or

Indicates an

Used to emphasize an

Monitoring

increase and

increase and decrease

change of the

dimensions,

decrease of

of value over time.

number of team

row: 1

value.

members for a

dimension,

certain period,

cross: 1

or stock

measure

Waterfall Chart

Column: 1

Displays the

Used to summarize and

Summary of

dimension in

sum when

emphasize important

customer

time attribute

adding or

words.

comments

& cross: 1

subtracting a

measure

value.

Using Workbook

Create/Manage Chart

Chart

Creation

Name/Icon

Condition

Word Cloud

Combo Chart

Use Attribute

Use Type

Use Example

Cross: 1 or

Displays the

Used to emphasize

Simultaneous

size of text

various types of

monitoring of

dimensions, 1

proportional to

information.

price and sales

measure

its appearance

by product

frequency.
Column: 1 or

Compares data

Used to visualize

Monitoring of

by combining

hierarchy data.

sales by product

dimensions &

bar and line

(major class-

cross: 2 or

charts.

medium class-

more and less

minor class)

than 4
measures

Treemap

Column: 1

Displays

Used to intuitively

Comparison of

dimension &

hierarchy data

compare various

products

row: 1 or more

as a group of

measurement targets.

evaluated by 5

dimensions &

overlapped

factors of

cross: 1

quadrangles.

quality

measure

Radar Chart

Cross: 1

Displays

Used to view the flow

Monitoring the

dimension, 1

various

of generated data.

flow of a project

or more

evaluation

measures

factors from

task

the central
point.

Network
Diagram

Subject shelf: 1

Connection

Used to monitor the

Monitoring the

dimension &

type diagram

quantitative flow of

energy flow in

target shelf: 1

displaying

data.

the factory

dimension &

factors with

connecting

dependency

shelf: 1
measure

Sankey Diagram

Column: 3 or

Displays

Used to view data

Monitoring of

proportion of

proportion.

profit by region

dimensions &

flow by the

cross: 1

width of the

measure

connection
line.

Using Workbook

Create/Manage Chart

Chart

Creation

Name/Icon

Condition

Gauge Chart

Use Attribute

Use Type

Use Example

Column: Row:

Visualizes

Used to compare

Comparison of

1 or more

performance

groups or view trends

sales and profits

dimensions &

for the

over time. Very effective

by product

cross: 1

established

when the trend is

measure

target.

fluctuating.

Using Workbook

Create/Manage Chart

5.5 Chart filter
A chart filter limits the scope of each column's data to be shown in the chart. This part describes
how to set and use chart filters and consists of the following.
 5.5.1 Filters included automatically: Describes basic filters that do not need to be added.
 5.5.2 Chart filter panel: Describes the chart filter panel shown on the right of the chart
home. In this panel, a user can conveniently search and set registered filters.
 5.5.3 Chart filter dialog box: Describes the chart filter dialog box to open the chart filter
panel. With this dialog box, a user can add or configure a chart filter.

Filters included automatically
The following column filters are included automatically so it is not necessary to add a chart
filter.
 Timestamp column filter: Due to the time-series characteristic of metatron Discovery,
time condition filtering is required.
 Recommended filter: Column filters designated as a 'recommended filter' in the data
source.
 Dashboard filter with a ‘global’ attribute: Filters applied to all charts registered in the
dashboard.

Using Workbook

Create/Manage Chart

Chart filter panel
Click the

icon at the top of the options area to display the chart filter panel as follows. In

this panel, a user can search and edit the basic information of registered filters.
1

❶ Filter number: Displays the number of the currently registered filter.
❷ Add/edit filter: Click to add a new filter or display the dialog box to configure an
existing filter.

❸ Filter name: Displays the column name of the filter.
❹ Filter attribute: Displays the basic attributes of the filter as an icon.


: Indicates that the column of the filter is a timestamp column.



: Indicates that the filter is a dashboard filter with a ‘global’ attribute.

❺ More: Used to reset or configure the filter.
❻ Filtering scope: Set the scope of data values to be shown in the chart.

Using Workbook

Create/Manage Chart

Chart filter dialog box
Click the

button in the top of the chart filter panel or click the

button in each filter area

to open the chart filter dialog box. With this dialog box, a user can add a new filter or configure
an existing filter.

A. Composition of chart filter dialog box
The chart filter dialog box is divided into the Dimension and Measure sections as shown
below:
1

❶ Dimension: You can select a dimension from the connected data source to make a
filter. For how to configure a dimension filter, see Subsection C.


indicates a timestamp column, for which a timestamp filter can be configured. For
how to configure a timestamp filter, see Subsection B.

❷ Measure: You can select a measure from the connected data source to make a filter.
For how to configure a measure filter, see Subsection D.
.
Using Workbook

Create/Manage Chart

B. Timestamp column filter setting
2

❶ Name: Displays the name of the column where the filter is applied.
❷ Filter attribute: Displays the basic attributes of the filter.
 Timestamp: Indicates that the filter column is a timestamp column.
 Chart name: Displays the name of the chart to which the chart applies.

❸ Set default status: Set the time range to be shown in the chart.
 All: Displays the contents of columns during the entire period on the chart without
applying time filtering.
 Period: Displays the contents of columns during a specific period on the chart. Click
‘Set to current time’ to set the latest reference time as the present time.
 Add/delete time range: Use

and

icons in the lower right to add or

delete a time range.

Using Workbook

Create/Manage Chart

C. Dimension column filter setting
2

1
3

❶ Name: Displays the name of the column where the filter is applied.
❷ Chart name: Displays the name of the chart to which the chart applies.
❸ Select range: Select a range to be shown on the chart by filtering data categories
included in the column of the selected filter.
 Single item: Select one data category and display it on the chart.
 Multiple items: Select multiple data categories and display them on the chart.

❹ Search by name: Search data values in the selected filter by their name.
❺ Filtering: Filters data categories to be displayed.
 Wild card: Search data in the selected filter using a specific letter. For example, to
view only data beginning with 'L', enter L in ‘The first word’ and click Apply. To view
only data ending with '89', enter 89 in ‘The last word’ and click Apply. To view only
data containing 'Cart', enter Cart in ‘Contain’ and click Apply.
 Condition: View measures in the column by applying conditions. Select measures to
display in the leftmost field. Data whose SUM/AVG/COUNT/MIN/MAX of the measure
is equal/over/under/not less than/not more than the target value will be displayed.
 Limitation: View the measure in the column by filtering with an upper or lower
ranking of the established conditions. Select the measure to be filtered.

Using Workbook

Create/Manage Chart

SUM/AVG/COUNT/MIN/MAX of the measure will be displayed after filtered by an
upper/lower ranking adjusted by the number selected by the user.

❻ Sort: Select how to sort data in the list.
 Frequency ascending/descending Sorts data columns in ascending/descending order
by frequency.
 Alphanumeric ascending/descending Sorts data columns in ascending/descending
order by name.

❼

icon: Used to indicate specific items from data categories in the list. Activate the only
icon next to the item to be displayed, and click the

icon at the top to display only

the selected item.

❽ Defined value: Used to add a data category without a column as a filter condition. This
function is required to create a filter in advance for a data category that may be added
later.

Using Workbook

Create/Manage Chart

D. Measure column filter setting
2

❶ Name: Displays the name of the column where the filter is applied.
❷ Chart name: Displays the name of the chart to which the chart applies.
❸ Select range: Select the minimum and maximum data values to be shown on the chart,
from the column of the selected filter.

Using Workbook

Create/Manage Chart

5.6 Chart style setting
After data pivoting, an option menu to set the chart style will be displayed on the right. The
composition of the menu is different by chart type. 5.6.1 part describes the setting items applied
to all chart types, and 5.6.2 part describes the 'Common Setting' items of each chart type.

Chart style setting menu
This part describes how to set each item of the chart style setting menu. Please note that among
the following items, some items may be unavailable in a specific chart type.
Common setting
Defines the shape of the chart. Items of a common
setting are different by chart type. See *** for more
detailed information on the common setting of each
chart type.

Color setting
Defines various colors in the chart.

❶ Graph color setting: Set criteria to classify the
color of items to display data in a graph, and
select the color theme.
 Series: Classifies color based on the type of
measure.
 Dimension: Classifies color based on the type of
dimension.
 Measure: Classifies color based on the size of
the measure.

❷ Text color: Select the text color of the dimension.
❸ Area color: Select the background color of the
table.

❹ Color range setting: This item is displayed when ‘Measure’ is selected as criteria to
classify the data indication color. Set 'ON' to change the color by the range of the
measure. The color range can be subdivided at will from the lowest section. To add a
new section, adjust the maximum value of the final section first and click ‘Add a new
range’.

Using Workbook

Create/Manage Chart

Number format
Defines how to display data values to be shown as text
in the chart graph. To use this function, enable the label
display function from the data label setting menu.

❶ Display format: Select the display format of a

data value among number, currency, percent and
exponent.

❷ Decimal place setting: Select the decimal places
for a data value.

❸ Number abbreviation setting: Set K (thousands),

M (millions) or B (billions) as an abbreviation for a
large data value. Select ‘Automatic Adjustment’ to

set the most proper unit automatically based on
digits of data values.
5

❹ Use thousands separator: Select whether to
display data values using a thousands separator.

❺ Customer symbol setting: Insert custom text

before/after data values.

❻ Preview: Displays an actual result of the defined
number format.

Y-axis setting (based on vertical chart type)

Defines how to display the Y-axis of the chart. Change
the chart type as 'Horizontal Type' from Common
Setting to switch between X-axis and Y-axis settings.

❶ Enter axis title: Used to enter the title of the Y-axis
of the chart. Do not use this function to hide the
title of the Y-axis.

❷ Show label: Select whether to display the data
label on the Y-axis of the chart. Do not use this
function to hide the data label of the Y-axis.
 Label setting: Set the number format displayed
on the data label of the Y-axis. Set 'Auto' to
apply settings of ‘Number Format’, or 'Manual' to
set a format for the data label of the Y-axis.

Using Workbook

Create/Manage Chart

X-axis setting (based on vertical chart type)
Defines how to display the X-axis of the chart. Change
the chart type as 'Horizontal Type' from Common
Setting to switch between X-axis and Y-axis settings.

❶ Enter axis title: Used to enter the title of the X-

axis of the chart. Do not use this function to hide
the title of the X-axis.

❷ Show label: Select whether to display the data

label on the X-axis of the chart. Do not use this
function to hide the data label of the X-axis.
 Label rotation: Select the angle of the data label
of the X-axis as 0/45/90 degrees.

Data label setting
Select whether to display the data value on the graph of
the chart.

Using Workbook

Create/Manage Chart

'Common Setting' items by chart type
This part describes how to style the 6 most popular charts (bar chart, table, line chart, scatter
chart, heatmap and pie chart).

A. Bar chart
Displays data values in each category of the dimension column as a bar.
1

❶ Chart type: Defines the shape of the chart.
 Vertical Type: Displays data values as a vertical bar based on the vertical dimension
axis.
 Horizontal Type: Displays data values as a horizontal bar based on the horizontal
dimension axis.
 Parallel Type: If 2 or more measures are selected, bars representing each measure are
displayed in parallel.
 Overlapped Type: If 2 or more measures are selected, all measures are displayed in
one bar.

❷ Limitation: Set the number of columns to be shown in the chart.

Using Workbook

Create/Manage Chart

B. Table
Creates a table block based on the categories of dimension columns pivoted to column/row
shelves. The corresponding measures will be displayed in the cross area as text.

❶ Chart type: Defines the shape of the chart.
 Pivot data: Aggregate measures with the same dimension category for classification in
one cell (SUM, MIN, MAX, etc.).
 Original data: Displays all original measures based on a specific dimension column
without aggregation.


Vertical view: Displays measures vertically in the table. This function is not available
when using the original data type.

 Horizontal view: Displays the table horizontally when using pivot data type. Displays
measures horizontally in the table.

❷ Show Head Column: Set text alignment in the head column as horizontal or vertical.
When using the original data, the head column is mandatory. When using pivot data
type, the head column is optional.

Using Workbook

Create/Manage Chart

C. Line chart
Displays data values in each category of the dimension column as dots. Dots of adjacent
category items are connected with each other. Used to view the trend.
1

❶ Chart type: Defines the shape of the chart.
 Line type: Displays the chart by connecting lines from the measure.
 Area type: Displays the chart by applying color to the area created by lines.
 Line & Point: Displays both dots based on the measure and lines connecting the dots.
 Point: Displays only dots based on the measure.
 Line: Displays only connection of lines.
 Basic type: Displays measures on the chart.
 Accumulation type: Displays the accumulation of measures on the chart.

Using Workbook

Create/Manage Chart

D. Scatter chart
Displays data values in each category of the dimension column as a specific symbol.

1
2

❶ Symbol type: Set the shape of the symbol to be shown on the chart.
❷ Symbol transparency: Set the transparency of the symbol to be shown on the chart.
Select solid color or transparent.

Using Workbook

Create/Manage Chart

E. Heatmap
Displays each data value of the measure column on the cross shelf as a color. For a larger
data value, a stronger color will be applied. Heatmap does not provide any common setting
items.

Using Workbook

Create/Manage Chart

F. Pie chart
Visualizes the proportion of each category item of the dimension column.
1

❶ Chart type: Defines the shape of the chart.
 Fan type: Displays the chart as a pie.
 Donut type: Displays the chart as a donut.

Using Workbook

Create/Manage Chart

Part 5

Using Notebook

The content of this document must not be copied,
distributed or used in part or in whole without the prior approval of SK Telecom.

Contents
1. Notebook Outline ...................................................................... 3
2. Initial Notebook Server Setting .............................................. 4
3. Create a Notebook .................................................................... 5
4. Use Functions of a Notebook ................................................ 11
4.1 Detailed notebook search .....................................................................................11
4.2 Notebook coding ......................................................................................................12
4.3 Register a notebook API ........................................................................................13

Using Notebook

Contents

1. Notebook Outline

metatron Discovery Workbench provides an environment for data analysis based on machine
learning. The main features are as follows.
 Data to be analyzed can be loaded from the data source saved in the Druid engine, or
extracted from a dashboard or chart inserted in a workbook.
 It supports the following external analysis tools and languages.
External analysis
tools
Jupyter
Zeppelin

Available languages
R
Python
Spark

Using these tools, analysts can analyze/sort/forecast data as they wish and share their analysis
process with others.
This part is divided as follows.
Chapter 2

Initial Notebook Server Setting: To link Jupyter or Zeppelin mentioned above, a

server to host these tools must be registered in metatron Discovery. This chapter
describes how to register a Jupyter or Zeppelin server.
Chapter 3

Create a Notebook: Each document that stores machine learning-based analysis

code is called a ‘Notebook’. This chapter describes how to create a Notebook, link data
to be analyzed, and decide on the tool and language for analysis.
Chapter 4

Utilize a Notebook: This chapter describes how to process coding from a newly

created notebook and execute/share results.

Using Notebook

Notebook Outline

2. Initial Notebook Server Setting
To analyze the data in the workspace using Notebook, a user must initialize the notebook server.
To initialize the Notebook server, the Notebook server must be registered. For more information,
see ‘4. Notebook Management’ of ‘Part 3 Management’.
The procedure for initializing the notebook server is as follows.
1. Click the

button in the top-right corner of the workspace and select ‘Set notebook

server’.

2. Once the Set notebook server screen is displayed, refer to the description below of each
item and select the notebook server that you want. Then click ‘Done’.

2
3

❶ Select server type: Among Jupyter or Zeppelin servers registered by the user, click the
notebook server to be linked to the workspace.

❷ Connected server: The name of the currently selected notebook server in the server list.
❸ Search by server name: Search a registered notebook server by its name.
❹ Server list: Displays notebook servers that meet the established criteria.

Using Notebook

Initial Notebook Server Setting

3. Create a Notebook
Once the notebook server is initialized, a user can create a notebook. To create a notebook, the
user must set ‘data to be analyzed’ and ‘tool and language for analysis’.
The notebook is created as follows.

1. Click the ‘+ Notebook’ button at the bottom of the workspace The screen to select the
type of data required to create a notebook will be displayed.

2. In the Notebook creation screen, select the type of data to be analyzed in the Notebook.

❶ Data source: Load data from the data source allowed in the workspace and analyze it.
❷ Dashboard: Load data from a dashboard in the workbook saved in the relevant
workspace and perform an analysis.

❸ Chart: Load data from a chart saved in the relevant workspace and perform an analysis.
❹ Not selected: Select this option if you want Zeppelin to analyze in SPARK.

Using Notebook

Create a Notebook

3. Select the data source, dashboard or chart to be analyzed, and click ‘Next’.
If ‘data source’ is selected as a data type

❶ Search by data source name: Search a data source allowed in the relevant workspace
by its name.

Using Notebook

Create a Notebook

If ‘dashboard’ is selected as a data type

❶ Dashboard search area: Select a workbook in the workspace to search dashboards
within the workbook. Then select the dashboard containing data to be analyzed.

❷ Dashboard information area: Displays an overview of the dashboard selected in the
search area.

Using Notebook

Create a Notebook

If ‘chart’ is selected as a data type

❶ Chart search area: Select a workbook in the workspace, and select a dashboard to
search its charts. Then select the chart containing data to be analyzed.

❷ Chart information area: Displays an overview of the chart selected in the search area.

Using Notebook

Create a Notebook

4. Enter the information of the external analysis tool and language to be linked to the
notebook, and enter the name and description of the notebook. Then click ‘Done’.

❶ Data source: The data source selected in the previous step.
❷ Server type: Select Jupyter or Zeppelin as a server type. However, to do this, the
relevant server type must be set in the initialized notebook server.

❸ Development language: Select the development language for analysis. If you select
Jupyter, you can analyze data with R or PYTHON, and Zeppelin analyzes data with
SPARK.

❹ Name: Enter the name of the notebook.
❺ Description: Enter the description of the notebook.

Using Notebook

Create a Notebook

5. Once a notebook is created, a user can see the new notebook in the workspace as follows.
Click to bring up a screen where you can use the notebook.

Using Notebook

Create a Notebook

4. Use Functions of a Notebook
Prepare the script in the newly created notebook that will be checked by exporting the result. Using
the REST API, share the analysis codes that have been developed and the execution results with
other users and systems.

4.1 Detailed notebook search
Select a notebook to be analyzed in the workspace. The detailed search screen will be displayed
as follows. A user can search data type, data source name, development language and code
entered when creating the notebook.

3
4
5

❶ Name: Name of the notebook.
❷ Description: Description of the notebook.
❸ Data type: Data type used in the notebook (data source/dashboard/chart).
❹ Data source: Name of data source/dashboard/chart imported from the notebook.
❺ Development language: Development language used in the notebook.
❻ Code: When clicked, the web-based notebook from the external tool (Jupyter /
Zeppelin) that the user applies to the notebook will launch in a new window. In that
notebook, a user can use the programing language to code. (See Paragraph 4.2.)

❼ API: API Information for running script in the notebook. With this function, a user can
run script on a regular basis. Click ‘Create API’ to display a screen to create API. (See
Paragraph 4.3.)

Using Notebook

Use Functions of a Notebook

4.2 Notebook coding
Click 'Details' in the code menu of the detailed notebook search screen. A new window for coding
the notebook will appear. At the top of this window, a code to load the dataset is inserted. Execute
the relevant cell to load a dataset in JSON into the dataset object.

The above screen is displayed when ‘R’ is selected as the development language. The same dataset
loading cell is inserted in ‘Python’. If you click "Run" in the middle of coding, you can check the
coding result within the notebook module itself. After coding is complete, click "Save."
metatron Discovery does not provide a detailed description because it only implements
interoperability with these open source external analysis tools. For more information on notebook
coding languages and interfaces, see the documents on the relevant analysis tool and language.

Using Notebook

Use Functions of a Notebook

4.3 Register a notebook API
Once a notebook is created, a user can return the result by calling the REST API. If the API is
registered, a URL to search the relevant result will be generated, so analysts can share analysis
logics with others.

1
2

❶ Return type: Select the type to use to return the results of notebook.
 HTML: The screen displays all of the execution results of the notebook script in HTML.
 JSON: Returns a JSON object in a custom format created in a notebook script. In this
case, the response.write(…) function provided by metatron Discovery will be used. The
following is an example code for using the response.write function.
- R-based notebook: response.write(list(coefficient = 2, intercept = 0))
- Python-based notebook: response.write({'coefficient' : 2.5, 'intercept' : 0})
 None: Runs the notebook script but does not provide a return value.

❷ Name: Name of the API to be registered.
❸ Description: Description of the API to be registered.
Once a user has entered all the API information and clicked ‘Done’, the API creation will be
completed, and the user can see the following REST API URL. Click ‘Result’ to search the URL
execution result value in a pop-up window.

Using Notebook

Use Functions of a Notebook

Part 6

Using Workbench

The content of this document must not be copied,
distributed or used in part or in whole without the prior approval of SK Telecom.

Contents
1. Workbench Outline ................................................................... 3
2. Create a Workbench .................................................................. 4
3. Utilize a Workbench .................................................................. 6
3.1 Basic information area .............................................................................................. 6
3.2 Schema and table area ............................................................................................. 7
3.3 Query editor area........................................................................................................ 8
3.4 Query result area ........................................................................................................ 9
3.5 Extra tools area ..........................................................................................................10
3.5.1 Global Variable Editing.................................................................................................. 10
3.5.2 Search Query History List............................................................................................. 11
3.5.3 Workbench navigation .................................................................................................. 12

Using Workbench

Contents

1. Workbench Outline

metatron Discovery Workbench provides an environment for data pre-processing and analysis
based on SQL. The main functions are as follows.
 A user can search various external databases simultaneously using the work space.
 A user can search/select linked tables and columns easily, and open a view with more
detailed information
 Query edit tools are embedded and query results can be checked in real time and used in
various ways.
- Query results can be downloaded as a local file or exported to an online Excel.
- Query results can be immediately visualized to help the analyst see how the data tables
look.
- Query results can be stored as data sources so that they can be used on Workbook or
Notebook for analysis.

This part is divided as follows.
Chapter 2

Create a Workbench: Each document storing SQL-based analysis query is called

a ‘Workbench’. This chapter describes how to create a Workbench and link a data
connection to be analyzed.
Chapter 3

Utilize a Workbench: This chapter describes how to create/execute a query from

the created notebook and visualize its results.
Using Workbench

Workbench Outline

2. Create a Workbench
To use a workbench in the relevant workspace, a data connection for the Workbench must be
established. For more information, please see ‘2.2 Data Connection’ of ‘Part 2 Management’.
A Workbench is created as follows.

1. Click the ‘+ Workbench’ button at the bottom of the workspace. A screen to link a data
connection for data analysis will be displayed.

2. Select the data connection for the workbench to be linked and used by the user, and click
‘Next’.

❶ Search by data connection name: Search a data connection allowed in the relevant
workspace by its name.

❷ DB Type: See a data connection by database type (Oracle/MySQL/Hive/Presto/Tibero).
Select ‘All’ to see data connections regardless of database type.

❸ Account Type: See a data connection by the established account type (Enter by
manager/Use user account/Input on connection). Select ‘All’ to see data connections
regardless of account type.

❹ Data Connection: Displays data connections that meet the established criteria.

Using Workbench

Create a Workbench

3. Confirm the information of the selected data connection and enter a name and a
description to create a workbench.

4. Once the workbench is created, a user can see a list of new and existing Workbenches in
the workspace screen as follows. Click the list to open a screen where you can use the
functions

Using Workbench

Workbench.

Create a Workbench

3. Utilize a Workbench
In the workbench, a user can edit and manage an SQL database easily, as well as visualize and save
the query results in various forms. The workbench screen is divided into the 5 following areas.

❶ Basic Information Area (see Paragraph 3.1)
❷ Schema and Table Area (see Paragraph 3.2)
❸ Query Editor Area (see Paragraph 3.3)
❹ Query Result Area (Paragraph 3.4)
❺ Extra Tools Area (Paragraph 3.5)

3.1 Basic information area
This area displays the information of the active workbench.
1
2

❶ Name: Name of the Workbench. Click it to change the workbench's name.
❷ Data Connection: Name of the data connection linked with the workbench. Click the
icon to see more information.

❸

: UI button to collapse or expand the panel

Using Workbench

Utilize a Workbench

3.2 Schema and table area
UI function to input a specific database, table and column in the query editor conveniently.
5

3
4

❶ Database name: Displays the name of the selected database. By default, the first table
of a data connection registered in the workbench will be selected.

❷ Database list: Used to change the selected database. Click to search all databases
included in the data connection. Select one database to replace the currently selected
database.

❸ Schema information: Displays the table list of the selected database, and information of
all the columns and records in each table.

❹ search: Search tables registered in the selected database by their name.
❺ Table name: Select the table that contains the required data. Then the SELECT * FROM
query for the table will be entered in the query editor on the right side automatically.

❻ Table information: Displays basic table information.
❼ Column list: Displays the name of all columns in the table and the data type of each
column. Click a column name to enter it in the query editor automatically.

Using Workbench

Utilize a Workbench

3.3 Query editor area
Editor screen to create and execute a query.
2

5
6

9 10 11

❶ : Displays previously created tabs if there are too many tabs. If there are not enough
tabs, the user can see a list of current tabs. Click a tab to move to it.

❷ Tab: To manage queries, a user can divide them into multiple tabs and execute or save
a query. Click the

❸
❹

button to edit a tab title or delete a tab.

: Click this button to add a new tab.
: Click this button to minimize or maximize the query editor area to full screen.

❺ Query row: Displays the row number of query code.
❻ Editor screen: Write the query statement in this area. A user can execute a multiple or
single query. Insert ‘;’ at the end of every query statement to divide queries to be
executed. An auto-complete function is provided.

❼ Max. number of rows to be searched: Set how many lines of query results will be
displayed.

❽ SQL BEAUTIFIER: Click this button to align the created query statements by query
syntax standards. Select a query to be aligned and click the button to sort it.

❾ ALL SQL EXECUTE: Execute all queries in the query statement. (Shortcut: Ctrl + Enter)
❿ SELECTED SQL EXECUTE: Execute a specific query in the query statement where the
mouse is located, or execute queries in the selected area by dragging the mouse.
(Shortcut: Command + Enter)

⓫ CLEAR SQL: Clear all query statements.

Using Workbench

Utilize a Workbench

3.4 Query result area
Once a query is executed, the result will be displayed in the query results tab. Every query result
will be accumulated continuously, but the user can select and delete a specific results tab. The
query result is displayed as a text grid, and Chart Preview, Save as Datasource, Export CSV file,
and Online Excel functions are provided.
2

4
3

❶

: Displays previously created tabs if there are too many tabs for a screen. If there
are not too many tabs, the user can see a list of current tabs. Click a tab to move to it.

❷ Tab: Name of the results tab. Click the

button to delete a tab.

❸ Search by Column Data: Search columns and values in the results.
❹ Chart Preview: Draw a chart based on the query result. The chart is only for
visualization; it is not reflected in the actual workspace. (For more information on how
to control, see "Chapter 5 Create/Manage a Chart" in "Part 4 Workbook".)

❺ Save as Datasource: Save the query result as a data source in the workspace. A pop-up
to create a data source will be displayed, and procedures such as selecting a data
connection and table will be replaced with the results of the workbench. As a result, the
schema definition and ingestion cycle will proceed immediately. (For more detailed
procedures, see Paragraph 2.1.3 of ‘Part 3 Management’.)

❻ Export CSV file: Download the query result as a local file (CSV).
❼ Online Excel: Click the button to display the results in Online Excel. Online Excel is an
online Excel application implemented in HTML5 named Spread Sheet. A browser
containing the online Excel application will be opened in a new window, and not in the
active workbench.

❽ Data History: The data history output by executing a query. A user can copy the output
data to the clipboard.

Using Workbench

Utilize a Workbench

3.5 Extra tools area
The extra tools area is composed of useful tools for Workbench.

3.5.1 Global Variable Editing
If a certain syntax is used repeatedly and you have to execute queries by changing only its
contents, set the syntax as a ‘global variable’ for convenient use.

2
1

❶ Variable type: For a variable type, calendar and text are provided.
❷ Add new variable: Select the variable type you want and click the 'Add new variable'
button. The relevant global variable will be added in the query editor area.

❸ Name: Enter the name of the variable.
❹ Variable value: In calendar, select the date. In text, select the value.

Using Workbench

Utilize a Workbench

3.5.2 Search Query History List
Search the history and results of an executed query or load that query.

❶ Search query: Search queries in the query history.
❷ Delete all: Delete all query data in the query history.
❸ Query list: Display information of queries executed after the last deletion of the query
history in order of execution. Click a query in the list to copy the relevant query
statement to the query editor.

❹ + More: Click this button to see more query data.

Using Workbench

Utilize a Workbench

3.5.3 Workbench navigation
Used to move to another workbench. Click the target workbench to move.

1
2

❶ Search a workbench: Search workbenches saved in the workspace.
❷ Workbench list: Display all workbenches saved in the workspace. Click a workbench in
the list to move to that workbench.

Using Workbench

Utilize a Workbench

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.6
Linearized                      : No
Author                          : AH YOUNG HWANG
Create Date                     : 2018:06:01 21:24:08+09:00
Modify Date                     : 2018:06:01 21:51:01+09:00
Tagged PDF                      : No
XMP Toolkit                     : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-20:48:00
Metadata Date                   : 2018:06:01 21:51:01+09:00
Creator Tool                    : Microsoft® Word 2016
Format                          : application/pdf
Creator                         : AH YOUNG HWANG
Document ID                     : uuid:e5e6659f-f5a3-4cb5-b323-34f6a51e4386
Instance ID                     : uuid:7475906b-d7e7-46cd-b5ae-e90c14518176
Producer                        : Microsoft® Word 2016
Page Layout                     : SinglePage
Page Mode                       : UseOutlines
Page Count                      : 168

EXIF Metadata provided by EXIF.tools

Metatron Discovery.user.manual.en

Navigation menu

Versions of this User Manual:

Views

Navigation