User Guide
User Manual:
Open the PDF directly: View PDF
.
Page Count: 3
| Download | |
| Open PDF In Browser | View PDF |
D-Cube: Dense-Block Detection in Terabyte-Scale Tensors Aug-6-2016 Kijung Shin 1 2 General Information Version: 1.0 Date: Aug-6-2016 Authors: Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos Introduction D-Cube (Disk-based Dense-block Detection) is an algorithm for detecting dense blocks in web-scale tensors. D-Cube has the following properties: Scalable: D-Cube can handle large data not fitting in memory or even on a disk. Fast: Even when data fit in memory, D-Cube outperforms its competitors in terms of speed. Accurate: D-Cube gives high accuracy in real-world data as well as theoretical accuracy guarantees. Detailed information about the method is explained in the following paper 3 Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos, “D-Cube: Dense-Block Detection in Terabyte-Scale Tensors”, ACM International Conference on Web Search and Data Mining (WSDM) 2017, Cambridge, UK Installation This package requires the following software to be installed in the system and set in PATH. Hadoop 1.x.x. from http://hadoop.apache.org Java 1.6.x. or higher, preferably from sun For compilation (optional), type ./compile.sh For packaging (optional), type ./package.sh For demo (optional), type make 1 4 Input File Format The input file lists all tuples in a relation. Each line corresponds to a tuple and consists of dimension attributes values and a measure attribute value, which are separated by a comma. Additionally, we assume the followings: Dimension attributes values are integers between 0 and (cardinality -1). Measure attribute values are in the last column of each row Measure attribute values are integers example_data.txt is an example input file. 5 Output Files Format For each found block, two files are created. For example, for the n-th found block, the following two files are created: block_n.tuples: this file lists tuples included in the n-th block. This file has the same format with the input file. block_n.attributes: this file lists attribute values included in the n-th block. Each line consists of the order of an attribute and a value of the attribute. output directory contains the examples of the output files. Statistics, including the volumes, masses, and densities of found blocks, are printed in the console. 6 Running D-Cube Serial Version How to Run ./run_single.sh input_path output_path dimension density_measure policy num_of_blocks Parameters input_path: path of the input file. See 4 for the detailed format of the input file output_path: path of the local directory for output files. See 5 for the detailed format of the output files dimension: number of dimension attributes density_measure: density measure to use. This parameter should be one among [ari, geo, susp] policy: policy to use for selecting attribute from which values are removed: This parameter should be one among [density, cardinality] num_of_blocks: number of blocks to find 2 7 Running D-Cube Hadoop Version How to Run ./run_hadoop.sh input_path num_of_reducers log_path output_path dimension density_measure policy num_of_blocks Parameters input_path: path of the input file in HDFS. See 4 for the detailed format of the input file output_path: path of the HDFS directory for output files. See 5 for the detailed format of the output files dimension: number of dimension attributes density_measure: density measure to use. This parameter should be one among [ari, geo, susp] policy: policy to use for selecting attribute from which values are removed: This parameter should be one among [density, cardinality] num_of_blocks: number of blocks to find num_of_redcuers: number of reducers to use log_path: path of the local directory for logs. 3
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 3 Language : en-US Tagged PDF : Yes Author : Kijung Creator : Microsoft® Word 2016 Create Date : 2016:11:22 14:05:45-05:00 Modify Date : 2016:11:22 14:05:45-05:00 Producer : Microsoft® Word 2016EXIF Metadata provided by EXIF.tools