User Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 3

DownloadUser Guide
Open PDF In BrowserView PDF
D-Cube: Dense-Block Detection in Terabyte-Scale Tensors
Aug-6-2016
Kijung Shin

1

2

General Information


Version: 1.0



Date: Aug-6-2016



Authors: Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos

Introduction
D-Cube (Disk-based Dense-block Detection) is an algorithm for detecting dense blocks in web-scale
tensors. D-Cube has the following properties:



Scalable: D-Cube can handle large data not fitting in memory or even on a disk.



Fast: Even when data fit in memory, D-Cube outperforms its competitors in terms of speed.



Accurate: D-Cube gives high accuracy in real-world data as well as theoretical accuracy guarantees.

Detailed information about the method is explained in the following paper


3

Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos, “D-Cube: Dense-Block Detection in
Terabyte-Scale Tensors”, ACM International Conference on Web Search and Data Mining (WSDM)
2017, Cambridge, UK

Installation


This package requires the following software to be installed in the system and set in PATH.



Hadoop 1.x.x. from http://hadoop.apache.org



Java 1.6.x. or higher, preferably from sun



For compilation (optional), type ./compile.sh



For packaging (optional), type ./package.sh



For demo (optional), type make

1

4

Input File Format
The input file lists all tuples in a relation. Each line corresponds to a tuple and consists of dimension
attributes values and a measure attribute value, which are separated by a comma. Additionally, we assume
the followings:


Dimension attributes values are integers between 0 and (cardinality -1).



Measure attribute values are in the last column of each row



Measure attribute values are integers

example_data.txt is an example input file.

5

Output Files Format
For each found block, two files are created. For example, for the n-th found block, the following two files are
created:


block_n.tuples: this file lists tuples included in the n-th block. This file has the same format with
the input file.



block_n.attributes: this file lists attribute values included in the n-th block. Each line consists of the
order of an attribute and a value of the attribute.

output directory contains the examples of the output files. Statistics, including the volumes, masses, and
densities of found blocks, are printed in the console.

6

Running D-Cube Serial Version


How to Run
./run_single.sh input_path output_path dimension density_measure policy num_of_blocks



Parameters


input_path: path of the input file. See 4 for the detailed format of the input file



output_path: path of the local directory for output files. See 5 for the detailed format of the output
files



dimension: number of dimension attributes



density_measure: density measure to use. This parameter should be one among [ari, geo, susp]



policy: policy to use for selecting attribute from which values are removed: This parameter should
be one among [density, cardinality]



num_of_blocks: number of blocks to find
2

7

Running D-Cube Hadoop Version


How to Run
./run_hadoop.sh input_path
num_of_reducers log_path



output_path

dimension

density_measure

policy

num_of_blocks

Parameters


input_path: path of the input file in HDFS. See 4 for the detailed format of the input file



output_path: path of the HDFS directory for output files. See 5 for the detailed format of the output
files



dimension: number of dimension attributes



density_measure: density measure to use. This parameter should be one among [ari, geo, susp]



policy: policy to use for selecting attribute from which values are removed: This parameter should
be one among [density, cardinality]



num_of_blocks: number of blocks to find



num_of_redcuers: number of reducers to use



log_path: path of the local directory for logs.

3



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 3
Language                        : en-US
Tagged PDF                      : Yes
Author                          : Kijung
Creator                         : Microsoft® Word 2016
Create Date                     : 2016:11:22 14:05:45-05:00
Modify Date                     : 2016:11:22 14:05:45-05:00
Producer                        : Microsoft® Word 2016
EXIF Metadata provided by EXIF.tools

Navigation menu