Read Me Install Guide Lines

User Manual:

Open the PDF directly: View PDF .
Page Count: 6

Data Warehouse & ETL Offload Code 
Samples Overview and Install Guide 
 
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. 
The Universal Permissive License (UPL), Version 1.0 
 
The Data Warehouse & ETL Offload Code Samples provide sample code artifacts to support data 
warehousing and ETL offload solution patterns in the Oracle Public Cloud and in an Oracle Cloud at 
Customer deployment.   The code artifacts and sample dataset provided are described below by folder.  
Installation and execution guidelines are included below.  
 
Code Samples & Installation Content 
Pre-Requisites and Assumptions .................................................................................................................. 1 
Global Files - SampleSourceFiles ................................................................................................................... 2 
Loading Autonomous Data Warehouse ........................................................................................................ 3 
GoldenGate_Parameter_Samples ............................................................................................................ 3 
ODISampleAutonomousDataWarehouseCloudLoad ................................................................................ 4 
BigDataCloudETLOffloadSparkNotebook ...................................................................................................... 5 
CloudAtCustomerODIETLOffload .................................................................................................................. 5 
 
 
 
Pre-Req’s and Assumptions 
These code samples are built using Oracle Cloud services (either Oracle Public Cloud or Oracle Cloud at 
Customer) and are provided as-is with no expressed warranty.   Solution documentation along with step-
by-step guides for service integration and configurations on the Data Warehouse and ETL Offload 
Solution patterns are available online at docs.oracle.com and should be read prior to implementing 
these samples: 
 Learn how to load the data warehouse for business analytics 
 Load the data warehouse for business analytics on Oracle Cloud 
 Load the data warehouse for business analytics on Cloud at Customer 

The solution documentation provides step-by-step guides to configuring the services and should be

followed prior to installation / execution of the sample workloads. To implement these samples you

will need to have provisioned the following services –

Oracle Autonomous Data Warehouse Cloud

Oracle Data Integration Platform Cloud

Oracle Analytics Cloud

Oracle Big Data Cloud (for Oracle Public Cloud ETL Offload)

Oracle Big Data Cloud @ Customer (for Cloud@Customer ETL Offload)

Global Files - SampleSourceFiles

SampleSourceFiles provide a set of .csv files that are used throughout the solution. These files are used

in both ODI Smart export samples (loading ADWC and ETL Offload for BigDataCloud@Customer) as well

as in the sample BigDataCloud Notebook for ETL Offload / Spark processing in the Oracle Public Cloud

CUSTOMER_SRC_FILE.csv provides a set of made up customer data (customer names are random letters

and numbers). This is used to load the CUSTOMER dimension and to provide REGION data for the ETL

Offload spark sample

Column Ordinal

Column Name

DataType

CUSTOMER_ID

NUMERIC

FIRST_NAME

VARCHAR

LAST_NAME

VARCHAR

REGION

VARCHAR

CITY

VARCHAR

SRC_PRODUCT.csv provides a set of product data. This is used to load the PRODUCT dimension in

ADWC and to provide Product Category / Family data for the ETL Offload Spark sample.

Column Ordinal

Column Name

DataType

PRODUCT_ID

NUMERIC

PRODUCT_NAME

VARCHAR

PRICE

NUMERIC

FAMILY

VARCHAR

ORDERS_FILE.csv provides a set of order data. This is used to load the SALES_FACT fact table in ADWC

(and the subsequent SALES_ANALYSIS table). It provides the dataset for the SALES_ANALYSIS ETL

Offload Spark Notebook and ODI project.

Column Ordinal

Column Name

DataType

ORDER_ID

NUMERIC

CUSTOMER_ID

NUMERIC

ORDER_DATE

DATE

PRODUCT_ID

NUMERIC

AMOUNT

NUMERIC

QTY

NUMERIC

ORDER_LINE_NUMBER

NUMERIC

Loading Autonomous Data Warehouse

This folder provides artifacts demonstrating how to replicate a source table to ADWC for reporting with

GoldenGate and how to load a star schema data warehouse with ODI. (Note this project can also be

used against an Exadata Cloud at Customer solution but may require you to change the knowledge

modules used in the solution)

GoldenGate_Parameter_Samples

Artifacts in this folder represent sample GoldenGate parameter files for real-time replication of data

from an Oracle database source (on-premise or DBCS) to an ADWC target. These files can be used with

any GoldenGate for Oracle 12.3 implementation including the Data Integration Platform Cloud’s

REMOTE AGENT.

exadwc.prm is a sample GoldenGate EXTRACT parameter file to capture from an Oracle database

source. To configure this file replace the <SID> entry with your Oracle database SID or CDB name.

This file is configured to capture from the schema ADWC_SRC.

padwc.prm is a sample GoldenGate EXTRACT PUMP parameter file to pump transactions capture via the

extract to the Data Integration Platform Cloud instance for delivery to Autonomous Data Warehouse

Cloud. To configure this file – replace the <remote_host_ipt> and <SID> entries.

radwc.prm is a sample GoldenGate REPLICAT parameter file for a Classic replicat. This applies the

transactions to the Autonomous Data Warehouse Cloud instance. To configure this file replace the

<SID> entry with your SID or CDB name. This applies data from the ADWC_SRC schema to the

adwc_repl schema in your adwc instance.

Place these files in your dirprm directory for the GoldenGate installation. (exadwc.prm and padwc.prm

in the source and radwc.prm in your DIPC host / DIPC remote agent installation).

Follow the instructions in the Data Warehouse Solution Pattern for info on configuring GoldenGate

within the DIPC remote agent and where to leverage these files.

ODISampleAutonomousDataWarehouseCloudLoad

These artifacts provide create user / ddl scripts to create a small star schema in the Autonomous Data

Warehouse Cloud and an ODI Smart Export to load the Autonomous Data Warehouse Cloud star

schema.

To import / execute these loads you must perform the following steps –

 Connect to your ADWC instance via SQL Developer

 Run the CreateODI_USER_and_TABLES.sql script to create the ODI_USER and the target tables

for the ODI project. **Note – replace the <PASSWORD> tag in the first line of the script with

the password you wish to use for the ODI User.

 SSH into your Data Integration Platform Cloud instance as OPC user

 Create a directory in your DIPC instance

o Mk dir /tmp/ADWCSample

 Copy the SourceFiles ORDERS_FILE.csv, SRC_PRODUCT.csv and CUSTOMER_SRC_FILE.csv and

ODIADWCSample.xml to your DIPC instance – the /tmp/ADWCSample directory you just created

 Change permissions on the directory and files to ensure the oracle user can read the files

o Sudo chmod 755 -R /tmp/ADWCSample

 VNC into your DIPC instance.

 Run ODI Studio

 Use Smart Import to import the ODI processes.

 Navigate to the Topology Manager in ODI Studio and alter the ADWC connection strings (review

the Solution Documentation for more details on how to obtain / configure your ODI / ADWC

connection).

 Open the ODI project and review the mappings.

 To run the processes execute the ODI package PKG ODI_User DW Sample Load



 Using SQL Developer – query the tables in the ADWC ODI_USER schema

o CUST_DIM

o DATE_DIM

o PRODUCT_DIM

o SALES_FACT

o SALES_FACT_ANALYSIS

Review the ODI mappings. Note that the 2 folders provide 2 methods for executing dimension and fact

loads in ADWC:

Loads Using Mappings and KM’s provides mappings the leverage certified Knowledge Modules for

ADWC. These are appropriate for Type 1 dimension style loads.

Loads using Dimensions and Cubes objects provides mappings that leverage the Dimensions and Cubes

objects within ODI to load the same set of tables into the ADWC ODI_USER schema. The Dimensions

objects are appropriate for Type 2 or Type 3 dimension loads.

BigDataCloudETLOffloadSparkNotebook

This artifact is a sample BigDataCloud Notebook that demonstrates Spark to load data from files stored

in Oracle Object Storage – perform an ETL routine leveraging SparkSQL and then store the result in

multiple file formats back in Object Storage (all running in the Oracle Public Cloud).

To configure and execute the Spark notebook –

 Login to your Oracle Public Cloud acct.

 Create a Container within your Oracle ObjectStorage service (for example BDCETL)

 Navigate to your BigDataCloud instance and the Notebook tab.

 Click Import to import the notebook (point to the json file and import).

 Open the notebook and click the run / play button to execute the Spark script.

 The Spark notebook generates a SALES_FACT_ANALYSIS similar to the results of the ODI ADWC

processes.

To review the results simply execute the %SQL section of the notebook which queries the resulting

SALES_ANALYSIS table in BigDataCloud.

CloudAtCustomerODIETLOffload

These artifacts demonstrate ODI performing ETL Offload in a BigData Cloud@Customer (Cloudera)

environment. These leverage the SourceFiles sample dataset to ingest data into the Cloudera Cluster

(via SQOOP or Spark) and then ODI executes Spark workloads to generate the SALES_ANALYSIS data set.

To configure and execute the ODI jobs –

 Follow the documentation on configuring your BigData Cloud at Customer machine for ODI

workload. Do not create your topology objects yet however.

 Ssh / vnc into you ODI Agent node in your cluster

 Create a directory in your clusters local file system

o Mkdir /tmp/sourcefiles

 Copy the ODISmartExport_ETLOffload_BigDataCloud@Customer.zip file to your ODI agent node

in your cluster.

 Copy the SourceFiles ORDERS_FILE.csv, SRC_PRODUCT.csv and CUSTOMER_SRC_FILE.csv to your

DIPC instance – the /tmp/sourcefiles directory you just created

 Run ODI Studio

 Smart Import the ODI project into you environment

 Configure your topology connections based on the Solution documentation step by step guide.

 Edit the ORCL_SRC connection to point to an Oracle database that you have (the ingestion

mappings pull from an Oracle db to demonstrate how to leverage either SQOOP or Spark to

ingest data to your cluster)

 Follow the Solution documentation instructions on configuring your Hadoop credential store for

the ORCL_SRC connection if you plan on using Spark to ingest the data

 Review the ODI project.

 0. Create Oracle Source Objects

o The PKG Create Oracle Objects and Load Data will create tables in your source Oracle

database and load the SourceFiles to that database.

 1. Ingest Data to BigData HDFS – SQOOP & Spark

o The PKG Ingest Data to Hive leverages ODI mappings to import data into your Big data

cluster leverage SQOOP.

o Open one of the mappings and navigate to the physical tab. Note that there are

physical designs for both SQOOP and Spark ingestion. Either can be leveraged to move

data into the cluster.

 2. ETL Offload – Spark

o The Mapping MAP Spark ETL Offload… demonstrates how to create a mapping in ODI

that leverages Spark to perform ETL offload to the cluster and create a SALES_ANALYSIS

data set.

 Execute the packages / mappings listed above in order to

o Create Oracle source objects

o Ingest data into your cluster

o Generate a SALES_ANALYSIS dataset leveraging ODI’s Spark capabilities.

Read Me Install Guide Lines

Navigation menu

Versions of this User Manual:

Views

Navigation