Users Manual 1 4
User Manual: Pdf
Open the PDF directly: View PDF .
Page Count: 22
Download | |
Open PDF In Browser | View PDF |
https://milegroup.github.io/gasatad/ GASATaD – USER'S GUIDE Version 1.4 Daniel Pereira Alonso, Leandro Rodríguez-Liñares, María J. Lado Escola Superior Enxeñería Informática de Ourense Universidade de Vigo Ourense, Spain dpalonso@esei.uvigo.es, leandro@uvigo.es, mrpepa@uvigo.es TABLE OF CONTENTS 1. OVERVIEW ............................................................................................................................ 3 2. GETTING STARTED ............................................................................................................ 4 2.1. License and Disclaimer ............................................................................................................................ 4 2.2. System Requirements ............................................................................................................................... 4 2.3. Downloading GASATaD .......................................................................................................................... 4 2.4. Installing GASATaD ................................................................................................................................ 5 2.5. Uninstalling GASATaD ........................................................................................................................... 6 2.6. Updating GASATaD ................................................................................................................................ 6 2.7. Known issues............................................................................................................................................. 6 3. USING GASATaD ................................................................................................................... 8 3.1. Running GASATaD ................................................................................................................................. 8 3.2. Main window ............................................................................................................................................ 9 3.3. Task Bar .................................................................................................................................................... 9 3.3.1. File .................................................................................................................................................... 10 3.3.2. Edit .................................................................................................................................................... 11 3.3.3. Options .............................................................................................................................................. 14 3.3.4. About ................................................................................................................................................. 15 3.4. Left Panel ................................................................................................................................................ 15 3.4.1. Basic statistics ................................................................................................................................... 15 3.4.2. Significance tests .............................................................................................................................. 16 3.4.3. Histogram Plot .................................................................................................................................. 17 3.4.4. Scatter Plot ........................................................................................................................................ 18 3.4.5. Pie Chart ........................................................................................................................................... 19 3.4.6. Box Plot ............................................................................................................................................. 20 3.4.7. Bar Chart .......................................................................................................................................... 21 2 1. OVERVIEW Presently, statistical analyses are becoming more and more important, since different measure systems, applications and programs yield a vast amount of data that should be analyzed. When performing statistical analysis, data collection becomes an essential task1. Moreover, it must be considered that data can be available in many different ways, depending on the type of study. Apart from the data format, it is beyond doubt that the selection of an adequate modeling and analysis is a fundamental key to obtain coherent, concluding results2. At the moment, many statistical software packages capable of dealing with data analysis can be found. Many of them are commercial, proprietary applications, and users should pay for a license of use and/or maintenance. As an alternative, more and more free, open source programs are being published, thus allowing users to perform statistical analyses without a purchase price obligation. Sometimes, due to the nature of the study to perform, standard software routines are not adequate, and then customization of the existing code seems to be a good option. This is one of the main advantages of the free software: users can include modifications in the code to adapt it to their particular needs. Frequently, researchers need to unify software functionality into one open source and easily extendible tool. Attending these needs, we have developed GASATaD, a free, open source software package for statistical analysis that can be used in the analysis of data coming from different files. GASATaD includes an intuitive graphical interface, since central ideas behind its design are ease of installation and of use. GASATaD has been implemented using Python3, a programming language based on the object oriented programming paradigm (although it also supports imperative and functional programming), that gives clean and legible code, thus improving software maintenance. Besides, it makes a quite efficient use of memory and it is very extensible, thanks to libraries available to programmers. Main functionalities of GASATaD are: ● It can merge data from more than one file (.csv or .xlsx formats are imported). ● It can calculate basic statistics. ● It can be used to compare data employing different significant analyses. ● It can plotting data and export figures to different graphic formats. Details of the implementation and installation instructions are given in the next Section. 1 Efron, Bradley, and Robert Tibshirani. Statistical data analysis in the computer age. Science (1991): 390-395 2 Ott, R. Lyman, and Micheal T. Longnecker. An introduction to statistical methods and data analysis. Nelson Education, 2015. 3 http://python.org 3 2. GETTING STARTED 2.1. License and Disclaimer Copyright (c) 2018 LIA2 Research Group, University of Vigo, Spain Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 2.2. System Requirements Binaries of GASATaD are available for Microsoft Windows, debian-based GNU Linux platforms and Apple MacOS. Minimal system requirements are: Microsoft Windows Systems: Windows XP or newer, 32 or 64 bits. Apple MacOS: Mac OSX Lion or newer. GNU Linux: any recent distribution based on deb packages. Tested on Ubuntu 16.04 LTS Xenial Xerus, Debian 9 stretch and Linux Mint 18.3 Sylvia. 2.3. Downloading GASATaD GASATaD, is available from https://milegroup.github.io/gasatad/. To download GASATaD, either click on the DOWNLOAD button, or on the DOWNLOAD link on the upper right corner, and select the corresponding version for your operating system (Figure 1). Version 1.4 is a stable and recommended version, that will be explained in this Manual. In this documentation, X and Y refer to the version and the subversion of the GASATaD program. Appart from the binaries for the different operating systems, you can also download the source code as a .zip file, as well as example files in .csv and .xlsx formats. 4 Figure 1. Binary files downloadable from 2.4. Installing GASATaD Installation of GASATaD is as follows. Microsoft Windows Windows binary is available as a GASATaD_X_Y.msi file. Installation is straightforward for each version: double click on the file and follow the instructions to complete the installation process. After license agreement, GASATaD is installed on your system. Installation also creates a link to the tool in your desktop. Apple MacOS Installation of GASATaD in a MacOS system is done by a script available in the program web page. The only thing to do is to open a terminal and paste the following text: bash <(curl -fsSL https://github.com/milegroup/gasatad/raw/master/docs/packages/GASATaD_1_4.MacInstall.sh) This installs (root access rights are required), and all the packages GASATaD depends on. The downloading and installation process will take some minutes and will need 900 MB of disk, approximately. GNU Linux GASATaD is distributed as a GASATaD_X.Y_all.deb package. The easiest way to install is to download this package, open a terminal and change to the directory where the file is. Then use the following commands: $ sudo apt install gdebi $ sudo gdebi GASATaD_X.Y_all.deb gdebi will install GASATaD and its dependencies. The program is avalaible both in the Start menu and in console mode as GASATaD. You can also install GASATaD using the Software Centre available in some distributions of GNU Linux. 5 Source Code GASATaD sources are distributed as a file named GASATaD_X.Y.zip. Advanced users can run the program from the source code. A working installation is needed of the Python programming language (version 2) including the following libraries: ● Matplotlib ● wxPython ● Numpy ● Scipy ● Pandas ● xlrd, xlwt and openpyxl In Linux debian-based systems, just open a terminal and use the following command: $ sudo apt install python-numpy python-wxgtk3.0 python-matplotlib pythonscipy python-pandas python-xlrd python-xlwt python-openpyxl Then, go to the directory containing the .zip file, uncompress it and use: $ python GASATaD_X_Y.py 2.5. Uninstalling GASATaD To remove GASATaD from your computer, use the following instructions. Microsoft Windows To uninstall GASATaD, go to Start -> Control Panel -> Add or Remove programs, select GASATaD and press the Remove button. Apple MacOS To uninstall GASATaD, just open a terminal and type: bash <(curl -fsSL https://github.com/milegroup/gasatad/raw/ghpages/packages/GASATaD.MacUninstall.sh) GNU Linux To uninstall GASATaD, open a terminal and use the command: $ sudo apt remove GASATaD 2.6. Updating GASATaD Updating GASATaD when new versions are available is an easy task that can be performed just installing the new versions as it were a fresh install. 2.7. Known issues Apple MacOS There is some problem with focus the in Apple MacOS implementation of wxpython libraries. The effect of this problem is that sometimes, just after running GASATaD, new windows are not shown. Normally, clicking on the GASATaD icon in the dock solves this problem. 6 GNU Linux If Linux version is called from a terminal, there may be a warning (Gtk-WARNING **: Unable to retrieve the file info...) in the terminal when saving a new project. This is caused by the implementation of file dialogs in some libraries. Users can ignore this message because the file is saved correctly. 7 3. USING GASATaD 3.1. Running GASATaD GASATaD provides a wide range of functionalities related to statistical analysis, and can be easily and intuitively used, presenting results in a very illustrative and attractive way for the user. To run GASATaD just double-click on the corresponding icon (Figure 2). Figure 2. GASATaD icon: double-click for running the application. When opening GASATaD, a notification button at the lower left corner may appear if a new version is available (Figure 3). By clicking on this button, the new version can be downloaded and installed. Figure 3. Initial view for GASATaD indicating new version. Once the application is running, the appearance of GASATaD is quite similar to other widely used softwares (Figure 4). 8 Figure 4. Initial view for GASATaD. 3.2. Main window The main window of GASATaD is composed by several different areas, and each of one allows the user to have access to all its functionalities: ● Task bar: containing File, Edit, Options and About menus. ● Right panel: in this area, data will be displayed when files are loaded. Columns are labelled with letters, following the alphabetical order, while rows are numbered from 1 to the total number of rows of the file. Initially, interactions with rows and columns are not allowed; only when files are loaded, operations with rows and columns are possible. ● Left panel: two different areas can be distinguished: the upper one corresponds to some information about the data; just under this area, options that allow manage data and files, and perform statistical analyses (basic or significant) are presented. Users can also generate a wide variety of plots. Initially, all these options are disabled, becoming automatically available when a data file is loaded. When starting GASATaD, only a few actions can be performed: the user can open a new file (File menu), select the format of file (Options menu) or consult the information in the About menu. The remaining functionalities are activated either when data are avaliable and then it becomes possible to perform the action. A detailed description about all the options is given in the next Sections of this manual. The data used for demostration purposes are fictitious, specifically fabricated for this task. They can be downloaded from files testfile1.xlsx and testfile2.xlsx, available on the GASATaD Website. 3.3. Task Bar Menus File, Edit, Options and About are included in the task bar, allowing users to perform different operations. 9 3.3.1. File Open new file... A new file to process can be opened with this option. As an example, we have used testfile1.xlsx. The shortcut Ctrl+N can be also used to open the file, as well as the Open new file option on the left panel. Once the file is opened, all the options of GASATaD become available for users. Numerical values appear in white cells, while yellow background is reserved to textual, categorical variables. Add file... Any number of additional files can be added to the data by either clicking on this option, using the shortcut Ctrl+O, or by clicking on the Add file button on the left panel. For this to be performed, the number of rows of the previous data must match the number of rows of the file. If not, an error message appears (Figure 5). Figure 5. Error message displyed when opening files of different number of rows. When an additional file is correctly opened, data are added to existing data as columns. In case column names are coincident, some characters are added to the new column, (see in Figure 6 “Case” is renamed to “Case_2”). Cells can be edited using double-click, and values can be modified. Empty cells are marked as “null”, and are not considered for the different analyses. Figure 6. Combining data from different files in GASATaD. 10 Save data... The data present in GASATaD can be saved in a new file, by clicking on Save data... option from the File menu. Alternatively, users can also type Ctrl+S, or press the Save data on the left panel. As an example, combined data from files testfile1.xlsx and testfile2.xlsx were saved. A dialogue box appears, and the user can select the output format as either .csv or .xlsx files. In case the user select the .csv option,a dialogue window with the csv export options is opened. Close data Data can be closed by clicking on this option, typing Ctrl+W, or pressing the corresponding button on the left panel. When this operation is finished, functionalities are disabled, until the user opens a new file. Quit To exit the application, this option should be pressed or, alternatively, the shortcut Ctrl+Q must be typed. 3.3.2. Edit In this menu, functionalities to deal with data in rows and columns are included. When either one or more rows or columns are selected, options may become available. Many of the options for columns are also displayed by selecting the corresponding column, and clicking on the mouse right button (Figure 7). An explanation about each option is included on the following. Figure 7. Funcitonalities on columns available using right-click. Undo The last operation can be undone by clicking on this option. Delete selected columns/rows When a row/column (or group of them) is selected, users have the possibilty of deleting it. This task can also be performed by clicking the mouse right button. In our example, column “Case_2” has been removed from data (Figure 8). 11 Figure 8. Deleting a selected colum. Rename selected column By pressing this button, a dialogue box appears, and the column/row name can be changed (Figure 9). Figure 9. Renaming a colum. Move selected column Sometimes, it can be interesting to move columns from one position to another. This can be done with this option. A dialogue box offering the user the posibility to select the new position for the selected column is displayed (Figure 10). Figure 10. Moving a selected colum to another position. 12 Sort using selected column By selecting a column and clicking on this option, all data can be sortered acording to the selected column. As an example, rows were sorted according to increasing values of “Height” column (Figure 11). Figure 11. Sorting rows according to “Height” column. Convert selected column to text On some occasions, it becomes necessary to convert numerical values to text, which means that numerical variables should become categorical variables. This funcionality is available in GASATaD by pressing this option. As an example, the “Age” column has been converted to text, and its values appear now with yellow background, as expected (Figure 12). Figure 12. Converting “Age” data to text. Convert selected column to numbers Similarly, text data can be converted to numerical values by clicking on this option. Add text column... This option allows users to incorporate new categorical data from a numerical column. A dialogue box appears and the user can select the new category name. In the example, a “Smoker Category” column was added from data belonging to the “Number of cigarettes per day” column (Figure 13). 13 Figure 13. Creating the “Smoker Category” column. As a result, a new text column was added to the right side of the data (Figure 14). Figure 14. Data incorporating the new column constructed. Delete columns Finally, one or more columns can be simultaneously selected and deleted by using this option. A dialogue box appears, and the user can select columns to be deleted (Figure 15). Figure 15. Deleting a set of selected columns. 3.3.3. Options Discard first column in csv files Sometimes, the first column of the CSV file can contain irrelevant information that cannot be interesting for the study. In this case, it can be discarded when opening the file by selecting this option. CSV character separator With this menu entry, the CSV character separator of the file can be selected. Options are Comma, Semicolon, and Tabulator. 14 Reset options By using this entry, all the options are reset to their default values. 3.3.4. About About GASATaD It opens a dialogue box with general information about the program. Figure 16. About GASATaD. 3.4. Left Panel 3.4.1. Basic statistics GASATaD automatically identifies columns containing only integer values, which can be used to select rage of data for analysis. In our example, Case, Age, Years of Schooling, Age of initiation in smoking, and Number of cigarettes per day are detected as integer values, and ranges can be established to perform different analyses (Figure 17). Figure 17. Detection of integer values (in bold) allowing to establish numerical ranges. 15 If any value in each of these columns is set to a value different from an integer, the column will not be included in the previous list. A quantitave analysis of numerical data can be performed by pressing this button. A dialogue box appears, and the user can select one or more numerical columns to perform the analysis. As an example, Figure 18 shows the statistical analysis obtained for the “Age of initiation in smoking” column, for women cathegorized as severe smokers. Number of cases, minimum, maximum, mean, median and mode of values are presented, as well as standard deviation, variance and covariance, 25%-50%-75% quartiles, Pearson correlation, kurtosis and data skew. Figure 18. Basic statistics. 3.4.2. Significance tests Several significance tests (standard t-test, Welch’s unequal variances, Kolmogorv-Smirnov and Wilcoxon rank-sum) can be used to find correlations between sets of data. GASATaD does not analyze if the test selected by the user is the most appropiate for a specific set of data; thus, it is users’ responsibility to verify the validity of the test applied. When the Significance test button is pressed, a dialogue box appears, and options to perform the analysis can be selected. When data and subsets by category haven been selected, and the Show Results option has been clicked, results appear on the right panel of the dialogue box (Figure 19). 16 Figure 19. Significance tests. 3.4.3. Histogram Plot Data can be plotted in different ways. First option corresponds to Histogram plot. By clicking on the corresponding icon, a dialogue box appears, where users can indicate their preferences, such as title, axis labels, display setting, legend position, x variable and tag (Figure 20). Figure 20. Histogram plot generation. The corresponding plot is then generated, according to users’ selection (Figure 21). 17 Figure 21. Final histogram plot generated with GASATaD. 3.4.4. Scatter Plot In a similar way, a scatter plot can be generated when the corresponding button is pressed. Users can select title, axis labels, display settings, legend, and x and y (one or more) variables. Furthermore, linear fit can be also estimated and plotted in the graph (Figure 22). Figure 22. Scatter plot generation. 18 Figure 23 shows the final result for the scatter plot. Figure 23. Final scatter plot generated with GASATaD. 3.4.5. Pie Chart Pie charts can also be generated with GASATaD. Similar to previous plots, title, legend, and tag can also be selected according to users’ preferences (Figure 24). Figure 24. Pie chart generation. 19 As a result, a pie chart is then generated (Figure 25). Figure 25. Final pie chart generated with GASATaD. 3.4.6. Box Plot If a box plot is needed, it can be obtained by clicking on the corresponding button of the left panel. A dialogue box appears, and title, display settings and variables can be selected. GASATaD also offers the possibility to group data to be plotted according to the established data categories (Figure 26). Figure 26. Box plot generation. 20 Figure 27 shows the box plot corresponding to the user’s preferences. Figure 27. Final box plot generated with GASATaD. 3.4.7. Bar Chart Finally, it is also possible to generate bar charts. When this button is pressed, a dialogue box appears that allows users to define title, axis labels, display setting, x variables and tags, and different operations to be applied over data (Figure 28). Figure 28. Bar chart generation. 21 The bar chart generated according to the options indicated in Figure 28 is now presented (Figure 29). Figure 29. Final bar chart generated with GASATaD. All the windows of the plots created by GASATaD include a set of buttons which give access to additional functionalities of the graphics (Figure 30). Figure 30. Tools to manage plots. It can be used to move the plot along both the x and y axis. Allows users to zoom in to a selected area in the plot Configuration of margins of plots Undo/redo After modifications, revert plot to its initial state. Plots can be saved in different formats, such as .eps, .pgf, .pdf, .png, .ps, .raw, .svg, .tiff. 22
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf Linearized : No Page Count : 22 PDF Version : 1.4 Title : Microsoft Word - UsersManual.docx Producer : Mac OS X 10.13.1 Quartz PDFContext Creator : Word Create Date : 2018:01:13 19:09:56Z Modify Date : 2018:01:13 19:09:56ZEXIF Metadata provided by EXIF.tools