Instructions To Run The Python Files
User Manual: Pdf
Open the PDF directly: View PDF
.
Page Count: 1
| Download | |
| Open PDF In Browser | View PDF |
INSTRUCTIONS The submission has 2 python files, config-parameters.py and Spark-With-Py.py To be able to use maximum available resources on cluster to run Spark-With-Py.py we need to pass a few sparkContext configuration parameters and number of files to read in each iteration while submitting the job. config-parameters.py computes those parameters. 1. config-parameters.py and Spark-With-Py.py can be run only via command line 2. Logging is enabled only for Spark-With-Py.py and default logging level is ERROR 3. Required logging level for log file to capture time and other information is INFO. Nothing has been logged in any other logging mode. 4. Memory profiling is not done as that slows down execution time and is required only when program faces some issue that needs to be debugged. 5. To run config-parameters.py via command line, use the following: a. python config-parameters.pyb. For example: python config-parameters.py 6 16 64 6. To run Spark-With-Py.py via command line, use the following: a. spark-submit --master= --num-executors --executor-cores --executormemory b. This will create empty log file 7. To be able to log into log file you need to provide logging level by adding the following to the above command line as the last argument: a. --log= b. e.g. --log=info 8. Level of logging has to be mentioned without quotes and levels of logging are debug, info, warning, error and critical. If incorrect logging level is provided, program will let you know and exit.
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.3 Linearized : No Page Count : 1 Title : Microsoft Word - Instructions_to_run_the_python_files.docx Producer : Mac OS X 10.12.4 Quartz PDFContext Creator : Word Create Date : 2017:05:05 23:15:44Z Modify Date : 2017:05:05 23:15:44ZEXIF Metadata provided by EXIF.tools