RIF V40 Manual
RIF_v40_Manual
User Manual:
Open the PDF directly: View PDF .
Page Count: 43
Download | |
Open PDF In Browser | View PDF |
The Rapid Inquiry Facility (RIF) Version 4.0 How to use the RIF 4.0 client Authors (2017-2019): Parkes, B., Morley, D., Hambly, P Small Area Health Statistics Unit (SAHSU) MRC-PHE Centre for Environment and Health Department of Epidemiology and Biostatistics School of Public Health Imperial College London Medical Faculty Building St Mary's Campus, Norfolk Place LONDON W2 1PG Website www.sahsu.org 1 Contents The Rapid Inquiry Facility (RIF).......................................................................................................................................... 1 Version 4.0 ........................................................................................................................................................................ 1 How to use the RIF 4.0 client ............................................................................................................................................ 1 1. Introduction to the RIF .................................................................................................................................................. 4 1.1 Purpose ................................................................................................................................................................... 4 1.2 Principal Features.................................................................................................................................................... 4 1.3 Current Limitations ................................................................................................................................................. 5 1.4 Input Facilities ......................................................................................................................................................... 5 1.5 Export Capability ..................................................................................................................................................... 6 1.6 Scope of this Manual............................................................................................................................................... 6 2. Background Considerations .......................................................................................................................................... 7 2.1 Disease mapping or risk analysis............................................................................................................................. 7 2.2 Geographical data issues ........................................................................................................................................ 8 2.3 Health and population database issues .................................................................................................................. 8 2.4 Exposure data ......................................................................................................................................................... 8 2.5 Statistics .................................................................................................................................................................. 8 2.6 Interpretation and Limitations ................................................................................................................................ 9 2.7 References ............................................................................................................................................................ 10 3. Starting up ................................................................................................................................................................... 11 3.1 Test data................................................................................................................................................................ 11 3.2 Logging in .............................................................................................................................................................. 12 3.3 RIF mapping tools.................................................................................................................................................. 12 4. Running a new RIF study ............................................................................................................................................. 13 4.1 Study details .......................................................................................................................................................... 14 4.2 Study area ............................................................................................................................................................. 14 4.3 Comparison area ................................................................................................................................................... 16 4.4 Investigation parameters ...................................................................................................................................... 16 4.5 Statistical methods................................................................................................................................................ 17 4.6 Saving and reloading studies ................................................................................................................................ 18 4.7 Study status ........................................................................................................................................................... 18 4.8 Run study .............................................................................................................................................................. 19 4.8 Messages ............................................................................................................................................................... 20 4.10 Reset ................................................................................................................................................................... 21 5. Data viewer ................................................................................................................................................................. 22 5.1 Choropleth map .................................................................................................................................................... 22 5.2 Data table .............................................................................................................................................................. 23 2 5.3 Population pyramid............................................................................................................................................... 24 5.4 Frequency distribution .......................................................................................................................................... 24 5.5 Risk Graphs............................................................................................................................................................ 25 5.5 Info Button ............................................................................................................................................................ 25 6. Mapping ...................................................................................................................................................................... 28 6.1 Choropleth maps................................................................................................................................................... 28 6.2 Disease map charts ............................................................................................................................................... 28 7 Export ........................................................................................................................................................................... 29 7.1 R Scripts................................................................................................................................................................. 30 7.2 Shapefiles .............................................................................................................................................................. 31 7.3 Generated Maps ................................................................................................................................................... 31 7.4 Reports .................................................................................................................................................................. 32 Appendices ...................................................................................................................................................................... 33 Appendix A. Statistical methods ................................................................................................................................. 33 Indirectly standardised risks ................................................................................................................................... 33 Empirical Bayes Analysis ......................................................................................................................................... 34 Full Bayesian smoothing ......................................................................................................................................... 35 R and R-INLA............................................................................................................................................................ 35 Appendix B. Descriptive analysis of Sahsuland ........................................................................................................... 37 Sahsuland population ............................................................................................................................................. 37 Sahsuland numerator data ..................................................................................................................................... 39 References .................................................................................................................................................................. 43 3 1. Introduction to the RIF 1.1 Purpose The Rapid Inquiry Facility (RIF) is an automated tool to allow epidemiologists to rapidly address epidemiological and public health questions using routines collected health and population data. The RIF can perform risk analysis around putative hazardous sources and can be used for disease mapping. It generates indirectly standardised rates and relative risks for any given health outcome, for specified age and year ranges, for any given geographical area. The RIF has been developed by the UK Small Area Health Unit at Imperial College London and funded by the US Centers for Disease Control and Prevention (CDC) and the National Institute for Health Research Health Protection Research Unit. This manual describes version 4.0 of the RIF (last update March 2019). This version of the RIF supports the following browsers: • • • Firefox 62.0 (64 bit) or greater: preferred as it handles the large memory requirements of high resolution administrative geographies (e.g. UK Census Output Area) best; Chrome 69.0 (64 bit) or greater; Microsoft Edge 42.17134 or greater; The RIF will work with Microsoft Internet Explorer (IE) 11.0.9600 or greater. Use of IE is not advised as it will run slowly and crash at medium levels of resolution (e.g. US Counties). 1.2 Principal Features • The system is designed using a three tier architecture. The client runs entirely within the user’s browser and does not require installation of any software on the client’s machine. All data is stored in a secure database on the server with user access, security and data processing performed by the middleware, also running on the server. Figure XXX. The RIF Arctitecture. • In addition to the point source ‘risk analysis’ and ‘disease mapping’ options, it is also possible to import detailed exposure data, such as output from dispersion modelling. 4 • • • • Within the risk analysis tool, the RIF performs test on the relative risks to assess for homogeneity and linear trend with exposure. Within the disease mapping tool, the RIF can perform empirical Bayes smoothing and full Bayes smoothing using the r-INLA library assessed from the middleware. The RIF can export data for further analysis in other (statistical) software packages such as SaTScan and WinBUGS and GIS packages such as ArcGIS and QGIS. Support for Information Governance via database security features (e.g. role-based access control, auditing). Users can only access and utilise data for which they have been granted by the database administrator. 1.3 Current Limitations • • • • • No support for covariates embedded in numerator or denominator data. Data must be extracted into a separate covariate table. Covariates must be merged into a single table and disaggregated (if required) by year. It is not planned to remove these restrictions which were in the previous RIF. External covariates must be quantilised. Support for continuous variable covariates (i.e. on the fly quantilisation) may be added in future releases; Denominator data is always used in indirect standardisation. There is no support currently in the RIF for direct standardisation using standard populations. This was supported in previous versions of the RIF and will be put back if required; No support for ad-hoc SQL. This functionality will be partially re-implemented in future releases using user specified conditions (pre-defined groups). The feature was removed as it cannot be implemented in a secure manner (i.e. it permits SQL injection attacks); Numerators are currently limited to ICD 10 coding only. Support will be added for: o ICD 9 (Autumn 2018); o ICD 11 (subject to the release of the 11th Edition in June 2018). In the longer term it is expected that support will be added for: • • • • • • • • • • ICD oncology (ICD-O-1); UK HES oper and A+E codes; User specified conditions (pre-defined groups), e.g. Low birthweight, complex groups of ICD codes, the all record condition (the 1=1 ad-hoc SQL filter in the previous RIF); The RIF currently lacks complex support for Information Governance beyond having strong role-based permissions. An information governance tool is envisaged to assist; More than one external covariate; More than one investigation; Multiple ICD field names (e.g. as used in hospital episode statistics); Covariate geolevel cannot be of lower resolution than study geolevel; Separate AGE_GROUP/AGE/SEX columns; Support for more than current and previous version of a table outcome (e.g. ICD). This would allow ICD 9, 10 and 11 (or 8, 9 and 10) to be supported all at the same time together the the start and end year for each version in a numerator table. Currently the RIF applies the same ICD filter to all years. This approach may cause problems if there are coding incompatibilities between the version (i.e. the same code means something different in two or more version). 1.4 Input Facilities The RIF will be provided with a Data Loader tool that allows users to import their health and GIS data. For the present there is a RIF Data Loading manual that describes how to manually process and load data into the RIF. 5 1.5 Export Capability The RIF is a versatile tool for generating smoothed disease maps and for calculating relative risks in populations living around putative sources of exposure. There are, however additional software packages that can also be used to explore spatial and temporal trends in data, and to detect statistically significant clusters of disease that many users will wish to employ to aid their investigations. The RIF has been designed to work alongside these programmes and can currently export data in ZIP file format for use in statistical packages (R, WinBUGS and SaTScan), GIS packages via shapefile import as well as to Microsoft Excel for further processing. 1.6 Scope of this Manual This manual explains how a user, typically an epidemiologist, would use the RIF client to set up a new study, run the study and examine the results. In addition, there is an appendix covering the statistical methods used by the RIF when a study is run. 6 2. Background Considerations This chapter will give a very brief overview of some of the considerations that should be made when planning, undertaking and interpreting a RIF study. These considerations are not unique to studies undertaken using the RIF, and although the RIF will help to speed up point source and mapping studies, users are cautioned to plan RIF studies as carefully as they would any other epidemiological investigation they would undertake. More details on these issues can be found in the following papers: Beale L, Hodgson S, Abellan JJ, LeFevre S, Jarup L, 2010. Evaluation of spatial relationships between health and the environment: The Rapid Inquiry Facility, Environmental Health Perspectives. Doi:10.1289/ehp.0901849 Beal L, Abellan J, Hodgson S, Jarup L, 2008. Risk assessment using spatial epidemiological methods, Environmental Health Perspectives. Volume 116, number 8. Ball W, LeFevre S, Jarup L, Beale L, 2008. Comparison of different methods for spatial analysis of cancer data in Utah, Environmental Health Perspectives. Volume 116, number 8. 2.1 Disease mapping or risk analysis The intention is to provide two types of study in the RIF, disease mapping and risk analysis. Initially the RIF v4.0 will only have disease mapping available. The disease mapping approach can be used to visualise mortality/morbidity rates and risks across and area. Disease mapping can provide an invaluable tool to explore spatial patterns of health outcomes; identify potential issues regarding data quality by geographical area; and identify areas which need additional resources or remediation. The risk analysis approach can be used to explore whether a source or some particular exposure (risk factor) is having an impact on health in a local population. To carry out a risk analysis study the geographical position of the putative risk factor will need to be known (as a point or a plume for example), and some consideration should be given to what distance the exposure of interest might be expected to have an impact. Thought should also be given to whether the exposure is likely to have a short or long term effect, as this will determine which years of health data will be most appropriate to study. Careful consideration should always be given to the most appropriate scale of investigation, which will depend on local circumstances (i.e. population density), and on the outcome of interest (i.e. whether this is a very rare outcome or not). The most appropriate geographical resolution to be used in any particular study will depend on individual circumstances and is often a compromise between having a high enough resolution to allow differences in disease risk to be assessed by small area, and having a large enough area (or population) to ensure that disease rates are sufficiently stable to permit interpretation. When mapping a rare disease across a sparsely populated area, thought should be given to the value of mapping at the smallest units available; if these units lead to very unstable risk estimates due to small populations, it may be preferable to lose some of the geographical resolution to gain more stable disease rates. While there may be a basis for investigating the population living in very close proximity to a putative pollution source, thought should be given to whether the size of this ’exposed’ population is sufficient to provide a meaningful risk estimate. When assessing potential disease clusters pot hoc, special care must be taken to avoid the ‘Texas sharpshooter’ effect, where the cluster is tightly defined in space and time, thus minimising the population at risk, and maximising the excess risk. 7 2.2 Geographical data issues There are many different types of enumeration areas (e.g. administrative, health, electoral, postcode etc.) and frequently their boundaries do not align. To use the RIF, however, the geographical data for any study must be hierarchical, with the boundaries at higher resolution areas being subdivisions of the larger areal units. In most countries census data are hierarchical. Since these boundaries tend to be defined administrative boundaries rather than physical boundaries, the boundary locations can, and do, change over time. Area name and codes can also change, which can be further complicated by the fact that different government departments can develop different coding systems for administrative geographies, or use slightly different names for the same area. Inconsistent geography is problematic for any temporal studies that span time periods when boundary changes have occurred and are a major problem when trying to produce and compare meaningful statistics over time. The Modifiable Areal Unit Problem (MAUP), as it is known, can affect any spatial study that utilises aggregate data sources (Openshaw, 1984). Since enumeration areas are often arbitrary and can change spatially and temporally, they are said to be ‘modifiable’. Many spatial datasets are collected at a fine resolution (i.e. a large number of small spatial units) but are released only after being spatially aggregated to a coarser resolution (i.e. a smaller number of larger spatial units). This is usual for census data which are collected from every household, but released as aggregated data for an enumeration area. When values are averaged during the process of aggregation, variability in the dataset is lost and values of statistics computed at different levels of spatial resolution will be different. This change is called the scale effect. The aggregation or zonation effect must also be considered, which occurs due to variation in numerical results that can occur due to the grouping of smaller areas into larger units (e.g. enumeration areas into census tracts). If EAs were grouped into zones of similar size to census tracts, but in a different spatial arrangement, it is likely to produce different statistical results between the two groupings of data. Problems related to the ecological fallacy should also be considered. Users should be wary of interpreting results solely from aggregate statistics and making assumptions about the nature of individuals from data that relates to groups. 2.3 Health and population database issues The appropriate statistical techniques and tools are available to calculate and map small area risks, but meaningful results can only be achieved if the underlying health and population data are accurate and complete. Local variations in ascertainment of health data, changes in health event recording over time (e.g. adoption of a new ICD revision), errors in the denominator (population) data (e.g. due to migration), or incomplete/inaccurate geocoding of either health or population data (e.g. greater positional errors for rural than for urban addresses) may introduce spurious temporal or spatial patterns in risk. Any underlying data problems are not corrected merely by running the analysis through the RIF. It is vital that any data quality issues are known about, dealt with where possible, and where issues remain, that these are considered fully when interpreting the results. 2.4 Exposure data 2.5 Statistics One problem associated with investigating health risks in small areas is that small populations have a small number of expected and observed events, which can lead to unstable risk estimates. This can result in misleading risk maps, especially if the area with the smallest populations are quite large (rural areas for instance), as these areas with the least stable risk estimates can dominate a map. In an attempt to overcome this problem and aid interpretation of the disease mapping output, the RIF can perform both empirical and full Bayesian smoothing of the raw, relative risks to account for sampling variability in the observed data. These methods can allow more meaningful risks to be calculated at the small area level; however these statistical techniques need to be applied with due consideration and caution. While raw risks can produce noisy maps that are difficult to interpret, over-smoothed maps may produce a 8 homogenous risk surface. Obviously there is a trade-off between high sensitivity (where true high risk areas can be identified), and high specificity (where areas of no excess risk are correctly identified) (Richardson et al., 2004). The RIF calculates standardised mortality (or incidence/morbidity) ratios (SMRs), however thse measures are not directly comparable between different exposure groups as they are not based on the same standard population (i.e. the age, gender and socio-economic make up between the populations being compared are not exactly the same). This should only result in misleading comparisons where the population structure is significantly different between the groups being compared (Goldman and Brender, 2000). An alternative to using indirectly standardised measures would be to use directly standardised rates and assess comparative mortality figures (CMFs) (or incidence/admissions figures) (Julious et al., 2001). The use of CMFs is advised for studies in which there are substantial numbers of cases in each study area or exposure category; however at the small geographical level, the number of cases is usually so few that directly standardised rates are unstable and the imprecision of this measure makes comparisons very difficult. In such situations it is appropriate to use SMRs instead of CMFs, provided the stratum specific death rate for each exposure class are proportional to the standard population rates, and bearing in mind that the rates in each exposure group may not be directly comparable with each other (Jarup and Best, 2003). Given that the it is intended for use at smaller geographical levels, the RIF uses indirect standardisation to calculate SMRs and does not currently perform direct standardisation. Currently the RIF does not perform any type of temporal analysis. If users are interested in time trends in rates or relative risks, they might use the RIF to explore trends by running several annual (or other time length) periods and then plotting the rates/risks obtained. This would be in spirit similar to moving average analysis. Although this could be valid for explorative purposes, users should be aware that it is not a proper moving average analysis, and therefore it lacks their properties, hence results should be interpreted carefully. A full description of the statistical methods performed by the RIF is given in appendix A. 2.6 Interpretation and Limitations Crucial to effective communication of spatial information is the use of suitable mapping techniques that convey results objectively. Effective mapping requires both an understanding of the mapped phenomena as well as the mechanisms to present the data appropriately. This is particularly true for maps that display data related to epidemiological risk in order to avoid misinterpretation or to over- or under-emphasise particular results. The map displays in the RIF can be configured with a wide range of base maps available. Additionally; it may be preferable for the user to export the data in order to import it into a GIS system with greater data symbolisation capabilities. The main advantages of undertaking spatial epidemiology at the small rather than large area level is increased interpretability – small-area studies are less susceptible to ecological bias created by within-area heterogeneity; they also allow local effects (such as impacts of point sources of pollution) to be investigated (Elliot and Wartenberg, 2004). While analysis at the small area can help reduce components of ecological bias, unless the analysis is carried out at the individual level it is impossible to rule out this bias entirely. Factors associated with disease in individuals (Morgenstern, 1998), and while the RIF can help assess whether a reported cluster is statistically significant or can demonstrate spatial trends in disease risk, the RIF cannot infer a causal relationship between an environmental factor and a disease. If cause for concern around a particular site is confirmed, data should be checked and validated (for completeness, diagnostic accuracy, etc.). Replication around other or multiple sites with similar discharges (if they can be found) can be carried out or indeed etiologic studies at the individual level can be designed and carried out. It should always be remembered that the RIF-type studies are subject to the limitations outlined above, and the user should therefore always consider what impact inconsistent geography, health and population data, exposure misclassification, ecological bias, and so on, will have on the study output. The RIF output therefore needs to be interpreted with caution and with expert local knowledge. 9 2.7 References Elliot P and Wartenberg D. 2004 Spatial epidemiology: current approaches and future challenges. Environmental Health Perspectives 112(9):998-1006. Goldman DA & Brender JD. 2000. Are standardized mortaility ratios valid for public health data analysis? Statistics in Medicine 19(8):1081-1088. Jarup L & Best N. 2003. Editorial comment on Geographical differences in cancer incidence in the Belgian Province of Limburg by Bruntinx and colleagues. European Journal of Cancer 39(14): 1973-1975. Julious SA, Nicholl J & George S. 2001. Why do we continue to use standardized mortality ratios for small area comparisons? Journal of Public Health Medicine 23(10):40-46. Openshaw S. 1984. The Modifiable Areal Unit Problem, CATMOG, Concepts and Techniques in Modern Geography, No 38, Norwich, GeoAbstracts. Morgenstern H. 1998. Ecological Studies, in Modern Epidemiology, Second Edition, KJ Rothman & S Greenland, eds, Lippincott Williams & Wilkins, pp.459-480. Richardson S, Thompson A, Best N et al. 2004. Interpreting posterior relative risk estimates in disease-mapping studies. Environmental Health Perspectives 112(9):1016-1025. 10 3. Starting up 3.1 Test data Before using your own data, we recommend using the sample health, population and geography data sets provided with the RIF software. These data give an idea of how the RIF works and help indicate what format data need to be in before they can be used in the RIF. The test data are automatically installed with RIF software, and in this version of the RIF/manual these data relate to a fictitious area known as Sahsuland. NOTE: all these datasets are fictitious and may not reflect patterns observed in reality. The data consist of: • • • Population data (by five year age group by gender1), for the period 1989-1996. Cancer incidence data for the period 1989-1996. Covariate data2 on socio-economic status, ethnicity, and proximity to Toxic Release Inventory (TRI) sites. The example dataset ‘Sahsuland’, supplied with the RIF software, can be used to test the software setup and as a template for database construction. Sahsuland is approximately 32,869 km2. The area of Sahsuland (Figure 1) uses for different hierarchical enumeration areas or levels of geography. Each area can be identified by a unique ID value. This also follows a hierarchical form, so that LEVEL2 areas are unique by LEVEL1 area, LEVEL3 areas are unique by LEVEL 2m and so on. A unique ID at the highest resolution of level 4 is a combination of the level 1 ID, the level 2 ID and the level 3 ID, and follows the system used by many countries for their census data (e.g. FIPS in the USA, Output areas in the UK – see Table1). Table 1. The census areas in Sahsuland Sahsuland Level 1 Level 2 Level 3 Level 4 Administrative area UK District Standard table Wards (ST Wards) Tract Super Output Areas (SOAs) Census Block Group Census Output Areas (OAs) USA State County Canada Province (PR) Census Division (CD) Census-subdivision (CSD) Dissemination Area (DA) All level IDs are stored as text values with LEVEL1 areas using two characters, LEVEL2 use 3 characters, which is joined with the LEVEL1 unique ID to make a LEVEL 2 unique ID of 2 and 3 characters separated by a dot. The Level 3 units use a 6 character value and the LEVEL4 is a single character. Again, unique IDs for each region are achieved by concatenating each lower resolution area such that the proceeding level falls within each separated by a single dot. Note. The data formats described in this section refer to Sahsuland data only. These data formats are not a requirement by the RIF. Data requirements are covered in the RIF Data Loading manual. 1 Age groups are actually by one year age group for ages 0 to 4, then by five year age groups from ages 5 to 85, e.g. age groups 0, 1, 2, 3, 4, 5-9, 10-14, …,80-84, 85+ 2 The RIF can handle ecological level covariate data 11 Screen shots and examples in this manual are based on this Sahsuland data. Descriptive statistics of Sahsuland can be found in appendix B. 3.2 Logging in Your RIF administrator should provide you with your user name, password and the correct URL to access the login page of your RIF installation. Type the URL in the address bar of your web browser and log in using your username and password. Figure XXX. The RIF login screen. 3.3 RIF mapping tools The RIF uses an internet browser based map viewer to display and select study areas and to visualise results. These maps work in the same way as conventional map containers (e.g. Google Maps), the main difference being they use open source map data. The following controls are common to all RIF maps. Other tools are specific to certain RIF functionality and these will be dealt with in the relevant sections. + Zoom in: Zoom map in. - Zoom out: Zoom map out. Quick export map: Save the current map in view as a png file. Zoom to selection: Zoom the map to the currently selected districts(s). 12 Zoom to study extent: Zoom the map to fit all districts used in this RIF study. Zoom to full extent: Zoom the map to fit all districts in the current geography. Clear selection: Deselect all currently selected districts. Transparency: Change the transparency (opacity) of the district layer being mapped Enter address: Zoom the map to a place name, geographical feature etc. Full screen: Display the map in full screen mode (Esc to exit) ? Attribution: Attribution (source, copyrights) information for the map layers Hide or show selection shapes: Display the shapes used to select the study. Green when displayed Base map: Opens the base map selection where the base map can be changed or removed. Figure XXX. Base map settings. 4. Running a new RIF study To create, then run a new RIF study, five steps will need to be completed: • • • • • Enter study details such as ‘geography’ and study type Defining the study area Defining the comparison area Set the investigation parameters Decide the statistical methods to be used These steps can all be completed under the Study Submission tab. At any point all the details of the study can be cleared by clicking the reset link. 13 Figure XXX. Study Submission screen 4.1 Study details The study details can be completed using the fields along the top of the study submission tab Study Name. The study must be given a name which is types into the study name field. The name cannot exceed 20 characters in length. Health Theme. This is defined during data loading as a means to group your relevant data sets together for ease of use. This will usually relate to a disease type, e.g. cancers. Geography. Select the appropriate geography from the drop-down list of geographies (this will usually relate to the either the country in which your study is for or to a predefined representation of districts e.g. tracts, wards). The list consists of all the available geographies which you can access. Numerator. Select the health outcome you are interested in mapping. Denominator. The data to be used as a denominator. This cannot be changed and is auto-selected depending on which numerator table is being used. Relevant numerator-denominator pairs are decided in the data loading process. • If you hover the mouse over the field name a detailed description will be displayed if available. 4.2 Study area Clicking the study area link will load the study area selection screen 14 Figure XXX. Study area selection screen The first thing to do is to set whether this is a disease mapping study or a risk analysis study using the switch at the top-left of the window. (See section 2.1 for more information on these study types). Initially the whole geography is displayed in the map area at the default level of resolution. The purpose of this screen is to select which areas of the geography are to be investigated. For risk analysis, one to six bands (of multiple districts) may be specified. Disease mapping studies are not banded in this way, so only one selection band is available as default. Area selection/deselection can be performed in a number of ways: • • • • • Use the band drop-down to specify the band number Clicking directly onto the area in the map Selecting areas from the list displayed on the left side of the screen Selecting areas by defining a freehand polygon using the button Select area within concentric bands using the button. Note that selections using the boundary of another polygon based on intersection with a district's centroid. The icon will toggle visibility of these centroid locations off and on. The icon will select all the districts within a geography By clicking the 'Upload from list' button, a comma deliminated (csv) file can be used to select a predefined list of districts and bands. This must have just thee columns ID, NAME and Band; e.g. seer_mainland_states.csv. ID,NAME,Band 01779778,California,1 01779780,Connecticut,1 01705317,Georgia,1 01779785,Iowa,1 01779786,Kentucky,1 01629543,Louisiana,1 01779789,Michigan,1 01779795,New Jersey,1 15 00897535,New Mexico,1 01455989,Utah,1 01779804,Washington,1 A zipped shapefile can also be used to define study areas. The select by shapefile icon brings up the open shapefile dialogue. The file can be points or polygon (see below) and must be in a zipped folder with extension .zip. The select by postal code/WGS/grid coordinates icon allows the user to enter a single point as a postal (or ZIP) code, WGS 84 (GPS) coordinate or using national coordinates. Postal codes are only available if the necessary lookup data has been loaded and setup (see the data loading manual). The Hide or show selection shapes icon allow the user to display the shapes used to select the study. It is green when shapes are displayed Once displayed on the map, the uploaded layers can be removed with the • • • clear selection icon. Multiple layers are supported. When loading point shapefile (e.g. incinerator locations), radii for exposure band(s) in metres need specified. When loading a polygon shapefile (e.g. output from an exposure model). Selections can be defined by the full extent of the areas with the shapefile, by a cut-off from an attribute within the file (e.g. a threshold value for a pollutant). This may be multiple (descending) cut-offs for risk analysis according to bands. In addition, in the case of a risk analysis study, if the polygons have a band attribute, this may be specified as well. Clicking ‘done’ will store the selected study regions which define the study area and return the user to the study submission tab. The study area ‘tree’ should now be coloured to indicate that this part of the study submission is complete. 4.3 Comparison area Clicking the comparison area tab on the study submission tab will load the comparison area selection screen for defining areas (populations) for the calculation of indirectly standardised risks. The comparison area screen is very similar to the study area selection screen with all the same methods of selection study regions that will for the comparison area. A comparison area is not banded in the same way as the study area, so by default only one selection band is possible. Note that the type of study can only be defined via the study area window. Clicking ‘done’ will store the selected study regions which define the comparison area and return the user to the study submission tab. The comparison area tree should now be coloured to indicate that this part of the study submission is complete. 4.4 Investigation parameters Clicking the investigation parameters link brings up the investigation parameters selection screen. 16 Figure XXX. Investigation parameters screen. ICD10 codes of interest can be selected from the list and search box on the left hand side of the screen. Selected codes appear in the lower right side of the screen. An investigation name can be edited in the investigation name field. The covariates section allows the user to select male, female or both sexes and define one further covariate field. The range of ages and years the study is to cover are chosen in the age range and year range sections respectively. Clicking ‘done’ will store the selected parameters and return the user to the study submission tab. The investigation parameter tree should now be coloured to indicate that this part of the study submission is complete. 4.5 Statistical methods Clicking the statistical methods link brings up a screen allowing selection of the statistical methods the RIF will for the disease mapping process. By default, the RIF always calculates the indirectly standardised rates and the relative risk ratios as well as performing Empirical Bayesian Smoothing. Either select one of the available procedures or the option not to apply smoothing if you do not to run n additional Bayesian method. For risk analysis only the default is supports, the user cannot perform an additional Bayesian method. Details of the basic statistical methods and the full Bayesian smoothing options along with external references are included in the Technical Appendix. Clicking ‘done’ will store the methods selected and return the user to the study submission tab. The statistical methods tree should now be coloured to indicate that this part of the study submission is complete. 17 Figure XXX. Statistical methods selection screen. 4.6 Saving and reloading studies At any point during the process of defining a new study submission, the details of the study can be saved locally on the user’s machine by clicking the save study link. This brings up a save as dialog box allowing the user to select a local folder and file name, then click ‘save’ to write a local copy of the study. Figure XXX. Open from file dialog box for reloading a study during the submission process At a later date the user can load the study by clicking the open from file link and navigating to the location of a locally stored file and clicking open. Depending on the browser being used, the name of the file can be changed. The file is in “human readable” JSON5 format and can be edited with care. Saved study setup files are also produced as a part of the export to ZIP functionality. 4.7 Study status The status link in the bottom, centre of the study submission screen brings up the submission status screen which lists all the user’s studies that have been previously submitted and are still available to access on the system. It also shows any studies that have recently been submitted but are not yet fully processed yet. The study state column provides the user with a short code to identify the status of the study: • • Study state: S means the study results have been computed and are ready to be used. Study state: F means the study failed in the R code. The trace button will display the R error in a popup. 18 • • Study state: C means the study has been created but not verified. Study state: E means the study has been extracted but results have not been generated. Entries that are highlighted in pink are not available for mapping as they are currently being processed or have failed to be run. Figure XXX. Submission status screen 4.8 Run study Once the user is happy that all the study submission details are complete, the run study link can be clicked which brings up the run study screen. Here an optional description can be added and the details of the study can be checked by clicking the view study submission summary link. The study will be submitted when the user clicks ‘run’ and a message will be displayed: ‘Success: study submitted’. Figure XXX. Run study screen 19 Once the study has been submitted, the use will be returned to the study submission screen. The study may take some minutes to run depending on the size and complexity of the study. Clicking the status link allows the user to see the status of any recently submitted studies. When the status is listed as R the results of the study can be viewed under the data viewer tab. View study submission summary provides a report of the study as submitted Figure XXX. View study submission summary screen 4.8 Messages RIF messages appear as strips from the top of the screen and disappear after five seconds unless they are a serious error and related to submitting or running a study. The messages screen allows the user to view the messages for the session. 20 Figure XXX. Messages screen 4.10 Reset The reset button clears the study selection. 21 5. Data viewer The results of a study that has been submitted and run can be examined and analysed in detail under the data viewer tab. The header of the data viewer has two dropdown boxes allowing the user to select the study to view and to filter the results by sex. The main data viewer area is made up of four sections. Moving clockwise from top right, these are: • • • • configurable choropleth map of the study area data table of the regions included in the study area population pyramid of the whole study area frequency distribution of the outcome currently displayed in the choropleth map. Figure XXX. An example disease mapping study in the data viewer tab. 5.1 Choropleth map The array of buttons above the map give the user several options to navigate the map and configure what information is displayed. The choropleth map icon brings up the choropleth map symbology screen which allows the user to select which field is displayed on the map and how the values in the field are represented using colours on the map. 22 Figure XXX. Choropleth map symbology screen. The colour scale dropdown selection lets the use choose from many different colour scales as defined by the Color Brewer project (http://colorbrewer2.org). The field to map dropdown lets the user select which data field is displayed on the map and in the frequency distribution graph. The intervals dropdown lets the user define the number of breaks in the data being displayed (the maximum is 11 but some colour scale have a lower limit). The classification dropdown lets the user define how the breaks in the data are defined. In each case the number of classifications is defined by the intervals dropdown. • • • • • • Quantile. Divided the data into quantiles with the same number of data points in each quantile. Equal interval. The difference between the lowest and highest values is divided equally into categories. Jenks. Uses the Jenks natural breaks classification method designed to determine the best arrangement of values into different classes (Jenks, 1967). Standard deviation. Calculated the mean and standard deviation of the underlying data, then divides the data into 5 categories (>2 standard deviations above/below mean, >1,<=2 standard deviations above/below mean, <=1 standard deviation away from mean). Always 5 categories. Atlas relative risk. The classification system used when displaying relative risks in the SAHSU Environment and Health Atlas (ref). Always 9 intervals. Atlas probability. The classification system used when displaying probabilities in the SAHSU Environment and Health Atlas (ref). Always 3 intervals. Additionally, the breaks defined by the classification method can be manually edited in the edit breaks section. 5.2 Data table The data table shows one row for each of the regions that make up the study area as selected during the study submission process. Clicking on rows in the data causes the corresponding regions to be highlighted in green on the choropleth map. The data table shows the population and health data (Area Id, Ban Id, Observer, Population, investigation id); basic statistics (Expected, Adjusted, Relative Risk, Lower95, Upper95); and the results of the Bayesian smoothing (Posterior probability, Smoothed Smr, Smoothed Smr Lower95, Smoothed Smr Upper95). 23 The data can be sorted ascending or descending by clicking on the column headings. There are filter boxes directly under the column names. Typing in a filter box will filter the results displayed in the data table. Note that the filters work using string filtering, i.e. typing 10 in the ‘Band Id’ filter will show all the rows that have the string ‘10’ in the band id (e.g. 10, 100, 101, 110). The link to the right of the column heading gives the user an additional three menu options: • • • Clear all filters. Removes an filter strings that have previously been entered. Export all data as csv. Allows the user to save a comma separated variable (csv) file of all the study data. Export visible data as csv. Allows the user to save a comma separated variable (csv) file of all the records currently in the data table. Clicking on regions in the choropleth map or selecting the rows in the data table sets the value for that row in the ‘selected’ column to 1 (as well as highlighting the row). By filtering the data table so that it only shows the records where ‘selected’ is 1, then choosing ‘export visible data as csv’, the user is able to effectively make a manual selection of regions in the map and export only the data associated with those selected regions. 5.3 Population pyramid The population pyramid section displays a population pyramid showing the age distribution of the residents of the geography from which the study is taken. The dropdown box in the top right of the population pyramid shows the years for which there is population data and allows the user to view the population pyramids for the available years. Figure XXX. Population pyramid. 5.4 Frequency distribution For disease mapping studies the frequency distribution histogram shows the distribution of the data field currently being displayed in the choropleth map. 24 Figure XXX. Distribution histogram 5.5 Risk Graphs For risk analysis studies the risk graph shows the distribution of the data field currently being displayed in the choropleth map. Figure XXX. Risk Graph 5.5 Info Button The data viewer, mapping and info screens all have an info button to the right of the gender chooser: This allows the user top select different reports for a study: 25 1. 2. 3. 4. Summary Covariate Loss Report Homogeneity Tests Risk Graphs Option 2 requires the study to use covariates and is stratified by covariate name. It also provides information on extract verification. Options 3 and 4 are for risk analysis studies only. Option 4 allows the users to interact with the risk graph. Up to three risk factors can be displayed: band, overage exposure and distance from source. To view one gender, set both gender selectors to the same choice. Hovering over the value circle displays the confidence limits. Figure XXX. Study Summary 26 Figure XXX. Covariate Loss Report Figure XXX. Homogeneity Tests Figure XXX. Risk Graphs with confidence limit displayed 27 6. Mapping The mapping tab allows the user to compare two studies side-by-side or different data from the same study in two different maps. The mapping screen is divided vertically in two to give a left display and a right display. The header for each display allows the user to select the study and sex of the data displayed in the area below. It is usually used for disease mapping. Different studies may be compared but they must share the same geography. Figure XXX. Disease mapping tab. 6.1 Choropleth maps Choropleth maps are displayed for both the left and right display with the same navigation and selection functions as the choropleth map in the data viewer tab (see section 5.1). In addition there additional functions to facilitate the simultaneous use of two maps. Lock and unlock selection. When locked, clicking on the map region in the left display will also highlight the same region in the right display and vice-versa. If the studies selected for the left and right displays have been defined at different geographical levels, lock selection will not work and a warning will be displayed. Lock and unlock map extents. When locked, both maps will always display the same extents. Zooming or scrolling on one map will cause the other map to move such that both maps display the same area. Copy symbology. Copies the symbology settings from the left display to the right display. The Hide or show selection shapes icon allow the user to display the shapes used to select the study. It is green by default when shapes are displayed Maps are selection Lock and extent Locked by default. 6.2 Disease map charts For disease mapping studies the charts displayed below the maps summarise the risk field data across the whole study area. The charts display all the values of the risk field show in the map above as well as the upper and lower 28 confidence intervals. The data in the chart is ordered from lowest to highest risk (moving left to right). Clicking on a point in the chart moves the red line to that data point, displays the risk and confidence intervals above the chart and selects the same region in the map above. Similarly, clicking on the region in the map moves the red line to the equivalent data point in the chart. Figure XXX. Disease map chart. The smaller chart displayed below the main chart acts as a navigation panel for the main chart above. By moving the mouse so it is above the left or right edge of the shaded area in the navigation panel, the user can click and drag to make the shaded area narrower or wider. This will increase or decrease the zoom in the main chart above. Figure XXX. Disease map chart showing how to alter the zoom. For risk analysis studies the risk graphs are shown for males and females. 7 Export The export tab allows a user to export the results of a completed study as a zip file. Select the study to display using the dropdown at the top of the page. Initially a preview of the extract (input data) and map (result) tables are shown. Enter a range of rows and click the refresh button to preview further rows. The map container shows either the study or comparison area. On clicking 'Export study tables', the full map and extract tables are downloaded as csv files and the study and comparison areas are downloaded in GIS format (geoJSON) at the specified detail level. All files are saved in a zipped folder prefixed with the study name and date in your specified output folder (see the RIF set up instructions for how to change this). Then button changes top ‘Exporting’ whilst the export is underway. When the export is completed the button changes to ‘Download Study Export’. ‘Save completed study’ allows the user to save the completed study setup as a JSON5 file suitable for upload an modification in another study. 29 Figure XXX. Export tab. The export ZIP file contains the following data: • • • • • • • • • Study extract, results and adjacency matrix in CSV form; R scripts to re-run statistics phase Shapefiles of results and the 2nd geolevel (e.g. state boundaries) Results as geoJSON; Geography: study and comparison areas as JSON and as shapefiles; Maps: data viewer and left and right disease mapping panes; Reports: denominator population pyramids by year; Saved study JSON5 file (to re-run study); An HTML report to integrate the above data. 7.1 R Scripts Supplied R scripts allows users to re-run statistics phase by running rif40_run_R.bat in the data directory Figure XXX. RE-running the R scripts. 30 7.2 Shapefiles ESRI standard shapefile are part of the export which contain males, females and both (males and females) data: • • • • Column names are shortened to fit DBF file rules; Results are rounded to 2 decimal places for sane quantizing; Per map styling is supplied as style layer descriptors (.sld) files so maps can be easily re-created in GIS tools. Beware the .sld file does not contain a filter. The user must supply their own for males, females or both; Projection used is the original administrative boundary projection (i.e. usually the normal for the country). Figure XXX. Creating a map from a shapefile. 7.3 Generated Maps The Zip file contains generated maps with excellent resolution to aid the user in producing their own maps. Figure XXX. Supplied maps. 31 Maps are produced in the following formats: • • • • • PNG; GeoTIFF. Can be used as a raster layer in GISTools. (.prj and .tfw files World Map format files are also created for each map). GeoTIFF can have copyright embedded; SVG. This is currently a single layer and therefore not easily editable; JPEG; EPS (encapsulated PostScript) and Postscript. Map by default are 7480 pixels wide @100 dpi (Elsevier full page: can be changed). The extent of the study is only is mapped. Maps are scaled with 3% extra margins and 10-50% left margin (for the Legend) depending on the aspect ratio, and then expanded to the grid resolution (e.g. 10 degrees). The grids can be turned off. The projection used is the original administrative boundary projection (i.e. usually the norm for the country). Printing setup is managed system wide. To manage the printing setup see section: 8.4.2 Printing Defaults of the RIF Web Application and Middleware Installation manual. 7.4 Reports Population denominators are provided as high-quality graphics in both tree and pyramid forms: Figure XXX. Tree population pyramid. Figure XXX. Population Pyramid. 32 Appendices Appendix A. Statistical methods Disease mapping Disease maps aim at representing the geographical distribution of the incidence of disease. In the frame of the RIF software, only maps produced with counts of data are considered. Counts of disease cases are reported for a list of regions, denoted here areas, and delimited by geographical boundaries. The easiest way to map the geographical variations of the disease would be to directly map the counts. However, these counts depend strongly of the age-sex composition of the population at risk within each area, and cannot be directly compared. Consequently it is necessary to apply the use of standardisation and standardised rates to exclude the effect of populations. Standardisation requires the definition of a ‘comparison’ population associated to each area. The comparison population may be the total population of all study areas, or subsets of study areas. Standardised disease rates in populations can be calculated using direct and indirect standardisation. Direct standardisation involves applying the disease rates found in the study areas to a standard population. This is not always available so, currently the RIF version 4.0 does not attempt direct standardisation of disease rates. When applying indirect standardisation, the standard disease rates from a comparison population are applied to the study population to give the expected disease counts such as the standardised mortality rates (SMR) or the standardised incidence rates (SIR). Indirectly standardised risks The standard disease rates are taken from the regions defined by the comparison area when the study was submitted. Cases occurring in the comparison population are located for male and females in each five year age band and per covariate. Standardised rates of disease in the comparison population are calculated for each gender, age and covariate by dividing the number of cases in each group by the total population in each corresponding gender, age and covariate stratum. The comparison area populations need to be large enough such that the age-sex-covariate specific disease rates are reliable. These standardised rates (rj*) are applied to the study area population strata (j) to calculate the expected counts (Ei), 𝐸𝑖 = ∑ 𝑁𝑖𝑗 𝑟𝑗∗ 𝑗 Where Nij is the study population in strata j of area i. The SMR for the study population(s) is then simply given by the ratio of the observed (Oi) to expected counts (Ei): 𝑆𝑀𝑅𝑖 = 𝑂𝑖 𝐸𝑖 Values of the relative risk larger than one indicate an excess of risk relatively to the underlying ‘comparison population’, whereas values smaller than 1 indicate a deficit of risk. Since each observation, is divided to the expected counts given the structure of the population, this variable has no unit, and comparisons between areas can be done. The risks obtained for two or more study populations (e.g. different ‘bands’ of exposure around a putative source of pollution), should not be directly compared as they are not based on the same standard population (i.e. the age, gender and covariate make up between the population being compared are not exactly the same). The uncertainty associated with the SMR estimate is quantified by calculating the 95% confidence. Here we note that relative risk 𝑅𝑅𝑖 is the parameter of a Poisson distribution, 𝑂𝑖 ∼ 𝑃(𝐸𝑖 𝑅𝑅𝑖 ). • If 𝑂𝑖 < 100, confidence intervals are found by using the Chi-squared method. 33 𝛼 For instance, if 𝑅𝑅𝑖𝑈 is the upper bound, then, by definition, for any 𝑋𝑖 ∼ 𝑃(𝐸𝑖 𝑅𝑅𝑖𝑈 ), 𝑃(𝑋𝑖 > 𝑂𝑖 ) = 2 . 2 We know that 𝑃(𝑋𝑖 ≤ 𝑂𝑖 ) = 1 − 𝑃(𝑌 ≤ 2 𝐸𝑖 𝑅𝑅𝑖𝑈 ), with 𝑌 ∼ Χ 2𝑂 . Consequently, 𝑖 +2 o 2 𝜒2𝑂 +2 1 upper 95% CI = 2𝐸 𝑞𝛼/2 𝐼 𝑖 o lower 95% CI = 2 𝜒2𝑂 1 𝐼+2 𝑞 . 2𝐸𝑖 1−𝛼/2 For confidence level 100(1-α)%. • If 𝑂𝑖 ≥ 100, a Gaussian approximation of the log relative risk is done, log(𝑂𝑖 /𝐸𝑖 ) is assumed to follow a Gaussian distribution with mean log(𝑅𝑅𝑖 ), and variance 1/𝑂𝑖 . Then, 𝑂𝐼 /𝐸𝐼 o lower 95% CI = o upper 95% CI = 𝐸𝐼 × exp (1.96√𝑂 ) exp(1.96√1/𝑂𝑖 ) 𝑂 1 𝐼 𝑖 Empirical Bayes Analysis The maps of the standardised mortality or incidence ratio may lead to misinterpretations, since the extreme values are more often the consequence of small counts than a true extreme relative risk. Consequently, a non-significantly positive standardized mortality risk may be higher than a significant one for which the population at risk is higher. To reduce the influence of the small counts, Clayton and Kaldor (1987) proposed empirical Bayes estimates of the relative risk. They are based on a Poisson-Gamma hierarchical model. By accounting for differential variability in the data, this hierarchical approach provides more precise estimates of relative risk and more accurate assessments of significant changes that the standard methods. The estimates are smoothed toward a global value by assuming that all the relative risks are sampled in the same gamma distribution. Moreover, the smaller the count, the stronger the shrinkage effect. In detail, relative risks 𝑅𝑅𝑖 are assumed to come from a single Gamma distribution of scale 𝛼 and shape 𝛽, 𝑂𝑖 ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝐸𝑖 𝑅𝑅𝑖 ) 𝑅𝑅𝑖 ∼ 𝐺𝑎𝑚𝑚𝑎(𝛼, 𝛽). Approximation of the posterior mean of relative risk 𝑅𝑅𝑖 are given by the empirical Bayes estimates, 𝐸(𝑅𝑅𝑖 |𝑂𝑖 , 𝐸𝑖 , 𝛼̂, 𝛽̂ ) = 𝑂𝑖 + 𝛽̂ 𝐸𝑖 + 𝛼̂ where 𝛼̂ and 𝛽̂ are the maximum likelihood estimates of 𝛼 and 𝛽. In practice, these estimates of the relative risk are obtained through the following iterative procedure: 1. Start with initial values for the relative risk 𝑅𝑅𝑖 . For instance ̂ 𝑖 = 𝑂𝑖 . 𝑅𝑅 𝐸𝑖 2. Obtain estimators 𝛼̂ and 𝛽̂ using equations 𝑛 𝛼̂ 1 ̂𝑖 = ∑ 𝑅𝑅 𝛽̂ 𝑛 𝑖=1 and ̂ 𝛼 𝛽̂ 2 1 ̂ 𝛽 ̂ 𝛼 ̂𝑖 − 2 ) = 𝑛−1 ∑𝑛𝑖=1(1 + 𝐸 ) (𝑅𝑅 𝛽̂ 𝑖 34 3. Obtain new estimated values for the relative risks ̂𝑖 = 𝑅𝑅 𝑂𝑖 + 𝛽̂ 𝐸𝑖 + 𝛼̂ 4. Repeat steps 2 and 3 until estimated values for 𝛼̂ and 𝛽̂ do not change significantly. Full Bayesian smoothing If some degree of spatial dependence of the risk is assumed, meaning that the risks in close areas are similar, we can estimate the risk in one area with information borrowed from its neighbours. Since these estimates are based on more information than SMR estimates, they are more robust. The BYM model (Besag, York, and Mollié 1991) was developed to address this issue of spatial dependence of risk. First a neighbourhood structure of the study area must be defined which specifies the neighbour relationships between areas. Here, we denotes 𝜕𝑖 the set neighbours for area 𝑖. The log relative risk is then split into two terms, 𝑢𝑖 a spatial term which accounts for the spatial variations within the risk, and 𝑣𝑖 a noise which accounts for independent local variations, log(𝑅𝑅𝑖 ) = 𝑢𝑖 + 𝑣𝑖 . The spatial term is modelled with an intrinsic conditional auto-regressive (CAR) model, meaning that for each area i, 𝑢𝑖 is given conditionally to its neighbours. Specifically, it is assumed to follow a normal distribution with mean, the neighbour mean and variance, a variance parameter, 𝜎𝑢2 , divided by the number of neighbours, 𝑛𝑖 , 𝑢𝑖 |𝑢−𝑖 ∼ 𝑁 ( ∑𝑘∈𝜕𝑖 𝑢𝑘 𝜎𝑢2 , ). 𝑛𝑖 𝑛𝑖 The larger the number of neighbours, the smaller the variance. The Gaussian noises, denoted vi, are supposed to be independent and identically distributed with variance 𝜎𝑣2 . One can consider the map of the uniquely spatially structured term given by (exp(𝑢𝑖 )). They display uniquely the part of the risk which has a smooth spatial distribution. The independent term is then seen as residual. But one can also consider the estimate of the entire relative risk which leads to robust estimate thanks to the spatial CAR term, but also allows individual variability. Other models are also proposed in the RIF, the ‘CAR’ model, in which the log relative risk is only modelled by the conditional autoregressive 𝑢𝑖 term. In this model, the relative risk is assumed to be spatially smooth without any local independent variations. In the CAR and the BYM model, estimates are smoothed towards a local mean. Finally the ‘HET’ (heterogeneous) model is also proposed. In this model, the log relative risk is only composed of the independent term vi. As for empirical Bayes estimates, with this model the log relative risk estimates are smoothed toward a global mean. These two models are thus similar, they only differ by their prior (Gaussian vs gamma), and their inference method (fully Bayesian vs empirical Bayes). Prior specification Minimally informative independent inverse gamma 𝐼𝐺𝑎𝑚𝑚𝑎(0.5,0.0005) priors are assigned to the model parameters 𝜎𝑢2 and 𝜎𝑣2 . R and R-INLA The statistical calculations described above are performed by the RIF server calling an instance of a R procedure (R Core Team, 2015). The full Bayesian smoothing is performed using the integrated nested Laplace approximations (INLA), proposed by Rue et al. (2009) . Whereas Bayesian inference often makes use of Markov Chain Monte Carlo (MCMC) simulation methods (Casella and George, 1992), the increasing size and related high spatial resolution of the datasets supported by the RIF mean that even state of the art, high powered servers would take several days to 35 perform Bayesian inference via MCMC. Since INLA uses a deterministic algorithm it produces accurate results much faster than MCMC methods (Blangiardo and Cameletti, 2015). The INLA functionality is delivered through and R package called R-INLA. The website www.r-inla.org is a useful source of further information; it provides many papers, tutorials and examples that assist in the understanding and implementation of INLA. 36 Appendix B. Descriptive analysis of Sahsuland Sahsuland is a fictitious island nation of approximately 32860 km2 comprising 4 different hierarchical levels of geography. Figure XXX. Sahsuland Sahsuland population Population data is available for the period 1989-1996 in five year age groups from ages 5 to 85 and one year age groups for the ages 0 to 4. 37 Table XXX. Sahsuland population by age. Year 1989 1990 1991 1992 1993 1994 1995 1996 0-4 622,270 631,650 642,260 644,900 639,190 637,090 628,860 617,540 5-9 600,646 602,524 607,206 616,862 631,042 645,432 655,334 661,308 10-19 1,283,290 1,261,454 1,239,456 1,223,550 1,216,676 1,212,252 1,220,428 1,235,714 20-39 2,893,970 2,899,864 2,914,374 2,941,456 2,949,324 2,966,698 2,979,138 2,964,174 40-59 2,380,576 2,416,484 2,450,368 2,477,394 2,508,448 2,548,060 2,579,936 2,610,400 60-79 1,961,654 1,960,170 1,961,088 1,954,206 1,942,956 1,931,200 1,925,502 1,922,848 80+ 460,856 472,292 484,834 497,464 511,690 523,196 533,890 538,526 Total 10,203,262 10,244,438 10,299,586 10,355,832 10,399,326 10,463,928 10,523,088 10,550,510 Table XXX. Suhsuland population by level 2 geography (17 divisions, equivalent to US Counties) Year Population Min Max 1st 1989 1990 1991 1992 1993 1994 1995 1996 7,848 7,942 8,060 8,076 8,096 8,096 8,100 8,138 4,531,618 4,546,944 4,561,132 4,584,918 4,604,640 4,629,846 4,659,062 4,668,972 59,719 60,344 60,948 61,080 61,194 61,260 61,357 61,315 Divisions Quartiles Number with pop. less than: Median 3rd 100,000 500,000 1 million 145,340 620,080 6 13 14 146,260 621,506 6 13 14 146,930 625,866 6 13 14 147,852 628,008 6 13 14 148,660 629,334 6 13 14 149,888 630,394 6 13 14 150,888 631,606 6 13 14 152,100 633,123 6 13 14 Table XXX. Suhsuland population by level 3 geography (202 divisions, equivalent to US Tract) Year Population Min Max 1st 1989 1990 1991 1992 1993 1994 1995 1996 2,760 2,820 2,856 2,880 2,900 2,950 2,980 3,016 203,848 206,622 209,214 208,694 208,924 209,526 210,038 209,140 26,000 26,271 26,484 26,621 26,612 26,843 27,207 27,397 Quartiles Median 3rd 42,746 62,519 42,917 62,459 43,119 63,308 43,297 63,684 43,458 64,134 43,746 64,505 44,281 64,837 44,457 65,221 Divisions Number with pop. less than: 10,000 50,000 100,000 9 122 183 9 123 183 8 124 183 8 123 183 8 123 183 8 121 182 8 121 182 8 121 182 Table XXX. Suhsuland population by level 4 geography (1230 divisions, equivalent to US Census Block Group) Year Population Min Max 1st 1989 1990 1991 1992 1993 1994 1995 128 144 146 146 154 160 156 33,588 33,662 34,150 34,226 34,516 34,622 34,806 3,567 3,578 3,610 3,636 3,648 3,702 3,744 Quartiles Median 3rd 6,120 10,759 6,091 10,656 6,148 10,760 6,182 10,774 6,217 10,724 6,300 10,804 6,323 10,842 38 Divisions Number with pop. less than: 2,500 5,000 20,000 58 517 1122 58 515 1121 58 512 1120 55 509 1119 52 506 1119 47 501 1117 46 499 1116 1996 146 35,092 3,750 6,347 10,894 44 495 1116 Sahsuland numerator data Numerator data consists of cancer incidences. For the early years (1989 – 1994) the data is in 45 different ICD-9 codes. Until a taxonomy service is written for ICD-9 codes, this data is cannot be used. The years 1995 and 1996 have cancer data for 41 different ICD-10 codes. Table XXX. Total cases in Sahsuland, ICD-10 codes and health conditions covered by the Environment and Health Atlas*. ICD-10 Description C220 C221 C223 C229 C33 C340 C341 C342 C343 C348 C349 C500 C501 C502 C503 C504 C505 C506 C508 C509 C64 C65 C670 C671 C672 C673 C674 C675 C676 C678 C679 C710 C711 C712 C713 C714 C715 C716 C717 C718 C719 Liver cell carcinoma Intrahepatic bile duct carcinoma Angiosarcoma of liver Malignant neoplasm of liver, not specified as primary or secondary Malignant neoplasm of trachea Malignant neoplasm of main bronchus Malignant neoplasm of upper lobe, bronchus or lung Malignant neoplasm of middle lobe, bronchus or lung Malignant neoplasm of lower lobe, bronchus or lung Malignant neoplasm of overlapping sites of bronchus and lung Malignant neoplasm of unspecified part of bronchus or lung Malignant neoplasm of nipple and areola Malignant neoplasm of central portion of breast Malignant neoplasm of upper-inner quadrant of breast Malignant neoplasm of lower-inner quadrant of breast Malignant neoplasm of upper-outer quadrant of breast Malignant neoplasm of lower-outer quadrant of breast Malignant neoplasm of axillary tail of breast Malignant neoplasm of overlapping sites of breast Malignant neoplasm of breast of unspecified site Malignant neoplasm of kidney, except renal pelvis Malignant neoplasm of renal pelvis Malignant neoplasm of trigone of bladder Malignant neoplasm of dome of bladder Malignant neoplasm of lateral wall of bladder Malignant neoplasm of anterior wall of bladder Malignant neoplasm of posterior wall of bladder Malignant neoplasm of bladder neck Malignant neoplasm of ureteric orifice Malignant neoplasm of overlapping sites of bladder Malignant neoplasm of bladder, unspecified Malignant neoplasm of cerebrum, except lobes and ventricles Malignant neoplasm of frontal lobe Malignant neoplasm of temporal lobe Malignant neoplasm of parietal lobe Malignant neoplasm of occipital lobe Malignant neoplasm of cerebral ventricle Malignant neoplasm of cerebellum Malignant neoplasm of brain stem Malignant neoplasm of overlapping sites of brain Malignant neoplasm of brain, unspecified 39 Cases 1995 162 186 2 74 22 422 1,164 120 572 74 4,258 220 506 430 190 1,596 352 50 118 3,914 1,014 78 26 14 186 44 88 42 64 142 2,500 92 188 98 146 32 10 50 26 42 134 1996 156 182 4 96 6 496 1,166 178 648 38 3,728 206 452 526 238 1,596 358 78 176 3,954 862 48 42 22 110 30 92 40 38 84 2,390 52 136 78 72 30 14 40 14 12 306 Total 19,448 18,794 EHA health conditions C34 Lung cancer 6,610 6,254 C50 Breast cancer 7,376 7,584 C67 Bladder cancer 3,106 2,848 C71 Brain cancer 818 754 C22 Liver cancer 424 428 * There are no cases of prostate cancer, skin cancer, Leukaemia or mesothelioma in the Sahsuland data. Table XXX. Summary of cancer cases for 1995, by level 2 geography. All cases, top 6 individual ICD codes and health conditions covered by the Environment and Health Atlas ICD-10 code Cases Min Max 1st All C349 C509 C679 C504 C341 C64 EHA C34 C50 C67 C71 C22 8 2 4 0 0 0 0 9,000 2,112 2,218 1,216 592 488 466 119 18 21 12 7 6 5 2 4 2 0 0 3,060 3,528 1,350 366 180 37 46 20 2 2 Quartiles Median 3rd 10 298 1,255 1 60 279 2 56 219 2 22 172 3 26 125 6 20 67 5 10 67 7 88 120 44 10 8 419 444 235 59 28 40 1 1 3 7 9 Divisions Number with cases less than: 50 150 500 1500 2 6 10 13 7 11 15 16 8 12 16 16 10 13 16 17 10 13 16 17 12 15 17 17 12 16 17 17 6 6 9 13 15 10 9 13 16 16 13 13 16 17 17 16 16 17 17 17 Table XXX. Summary of cancer cases for 1996, by level 2 geography. All cases, top 6 individual ICD codes and health conditions covered by the Environment and Health Atlas ICD-10 code All C349 C509 C679 C504 C341 C64 EHA C34 C50 C67 C71 C22 Cases Min Max 10 2 4 0 0 0 0 8,962 1,794 2,174 1,250 690 542 380 1st 93 18 16 10 3 6 5 2 4 0 0 0 2,926 3,708 1,354 388 178 28 39 14 3 3 Quartiles Median 3rd 228 1,172 68 242 46 248 28 154 14 122 16 53 10 52 96 72 44 8 6 10 0 2 3 2 7 6 8 360 513 177 41 31 Divisions Number with cases less than: 50 150 500 1500 2 6 11 13 8 12 15 16 9 12 16 16 11 13 16 17 11 13 16 17 11 14 16 17 13 16 17 17 1 1 1 9 11 6 5 10 13 15 11 10 13 16 16 14 13 16 17 17 16 16 17 17 17 Table XXX. Summary of cancer cases for 1995, by level 3 geography. All cases, top 6 individual ICD codes and health conditions covered by the Environment and Health Atlas ICD-10 code All C349 C509 C679 C504 C341 C64 EHA C34 C50 C67 C71 C22 Cases Min Max 0 0 0 0 0 0 0 412 106 90 50 38 36 24 5% 20 2 2 0 0 0 0 0 0 0 0 0 172 132 54 22 12 4 6 2 0 0 25% 52 10 8 4 2 2 2 Centiles 50% 81 16 14 10 6 4 4 75% 116 28 24 16 12 8 8 95% 262 65 56 38 25 18 14 16 20 8 2 0 26 30 12 4 2 41 46 20 6 4 100 91 44 12 8 Divisions Number with cases less than: 1 5 25 100 1 2 14 127 3 24 145 201 4 26 154 202 14 54 182 202 39 93 192 202 39 113 198 202 40 122 202 202 2 1 8 49 87 12 6 38 141 176 99 68 171 202 202 192 195 202 202 202 Table XXX. Summary of cancer cases for 1996, by level 3 geography. All cases, top 6 individual ICD codes and health conditions covered by the Environment and Health Atlas ICD-10 code All C349 C509 C679 C504 C341 Cases Min 4 0 0 0 0 0 Max 360 72 130 70 62 32 5% 22 2 2 2 0 0 25% 44 8 10 6 2 2 Centiles 50% 79 14 14 10 4 4 75% 115 24 24 16 12 8 95% 267 52 54 36 28 20 41 Divisions Number with cases less than: 1 5 25 100 0 2 15 131 6 24 154 202 6 23 153 200 9 50 185 202 48 105 189 202 43 117 197 202 C64 EHA C34 C50 C67 C71 C22 0 24 0 0 4 6 12 52 131 202 202 0 0 0 0 0 132 160 74 26 14 6 6 2 0 0 14 18 6 0 0 24 32 12 2 2 41 48 20 6 3 84 100 41 12 8 1 1 7 58 90 9 9 35 141 174 107 79 177 201 202 195 191 202 202 202 Table XXX. Summary of cancer cases for 1995, by level 4 geography. All cases, top 6 individual ICD codes and health conditions covered by the Environment and Health Atlas ICD-10 code Min Max All C349 C509 C679 C504 C341 C64 EHA C34 C50 C67 C71 C22 Cases 0 0 0 0 0 0 0 70 22 26 18 16 14 8 5% 2 0 0 0 0 0 0 25% 6 0 0 0 0 0 0 0 0 0 0 0 26 32 24 8 6 0 0 0 0 0 2 2 0 0 0 Centiles 50% 75% 12 22 2 6 2 4 2 4 0 2 0 2 0 2 4 4 2 0 0 8 8 4 2 0 95% 42 12 10 8 6 4 4 17 16 8 4 2 Divisions Number with cases less than: 1 3 5 15 50 38 115 237 719 1204 374 729 918 1208 1230 402 736 952 1214 1230 573 897 1085 1228 1230 747 1041 1149 1229 1230 820 1115 1190 1230 1230 854 1126 1204 1230 1230 228 181 476 918 1,050 536 413 804 1154 1204 733 637 1025 1215 1224 1151 1150 1226 1230 1230 1230 1230 1230 1230 1230 Table XXX. Summary of cancer cases for 1996, by level 4 geography. All cases, top 6 individual ICD codes and health conditions covered by the Environment and Health Atlas ICD-10 code Min Max All C349 C509 C679 C504 C341 C64 EHA C34 C50 C67 C71 C22 Cases 0 0 0 0 0 0 0 70 22 26 18 16 14 8 5% 2 0 0 0 0 0 0 25% 6 0 0 0 0 0 0 0 0 0 0 0 26 32 24 8 6 0 0 0 0 0 2 2 0 0 0 Centiles 50% 75% 12 22 2 6 2 4 2 4 0 2 0 2 0 2 4 4 2 0 0 8 8 4 2 0 95% 42 12 10 8 6 4 4 17 16 8 4 2 42 Divisions Number with cases less than: 1 3 5 15 50 38 115 237 719 1204 374 729 918 1208 1230 402 736 952 1214 1230 573 897 1085 1228 1230 747 1041 1149 1229 1230 820 1115 1190 1230 1230 854 1126 1204 1230 1230 228 181 476 918 1,050 536 413 804 1154 1204 733 637 1025 1215 1224 1151 1150 1226 1230 1230 1230 1230 1230 1230 1230 References Besag J, York J, Mollie A. (1991). Bayesian image restoration, with applications in spatial statistics. Annals of the Institute of Statistical Mathematics, 43, 1-59. Blangiardo M. and Cameletti M. (2015). Spatial and spatio-temporal Bayesian models with R-INLA. John Wiley and Sons Ltd. Casella G. and George E. (1992). Explaining the Gibbs sampler. American Statistician, 46, 167-174. Clayton DG and Kaldor J. (1987). Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics 43, 671-681 R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. Rue H, Martino S, and Chopin N. (2009). Approximate Bayesian inference for latent Gaussian model by using integrated Laplace approximations (with discussion). Journal of the Royal Statistical Society, Series B, 71, 319-392. 43
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.7 Linearized : No Page Count : 43 Language : en-GB Tagged PDF : Yes XMP Toolkit : 3.1-701 Producer : Microsoft® Word for Office 365 Creator : Parkes, Brandon L Creator Tool : Microsoft® Word for Office 365 Create Date : 2019:03:06 15:45:36+00:00 Modify Date : 2019:03:06 15:45:36+00:00 Document ID : uuid:61F8A885-F870-4005-86B3-13927EDF96BC Instance ID : uuid:61F8A885-F870-4005-86B3-13927EDF96BC Author : Parkes, Brandon LEXIF Metadata provided by EXIF.tools