M4 Competitors Guide
User Manual:
Open the PDF directly: View PDF .
Page Count: 7
Download | |
Open PDF In Browser | View PDF |
Competitor’s Guide: Prizes and Rules Contents The Prizes ...................................................................................................................................................... 2 1. Three Major Prizes ............................................................................................................................ 2 2. Student Prize ..................................................................................................................................... 3 3. Full Reproducibility Prize................................................................................................................... 4 4. Prediction Intervals Prize .................................................................................................................. 4 Forecasting Horizons..................................................................................................................................... 5 The dataset ................................................................................................................................................... 5 The Benchmarks............................................................................................................................................ 6 Factors Affecting Forecasting Accuracy ........................................................................................................ 7 1 The Prizes There will be six Prizes awarded to the winners of the M4 Competition. The exact cash amounts to be granted (at present standing at 27,000€) will depend on securing additional sponsors, announced later. Proportionally, the total amount of 20,000€ generously provided by the University of Nicosia will be distributed as follows: Prize 1st Prize 2nd Prize 3rd Prize Prediction Intervals Prize The UBER Student Prize The Amazon Prize Description Best performing method according to OWA Second-best performing method according to OWA Third-best performing method according to OWA Best performing method according to MSIS Best performing method among student competitors according to OWA The best reproducible forecasting method according to OWA Percentage (%) 45 20 10 25 5,000€ 2,000€ Additionally, the global taxi technology company UBER will generously award a special Student Prize of 5,000€ to the student with the most accurate forecasting method according to OWA and Amazon will generously award 2,000€ for the best Reproducible forecasting method. There are no restrictions in collecting more than one prize. 1. Three Major Prizes There will be three major Prizes for the First, Second and Third winner of the competition who will be selected based on the performance of the participating methods according to the Overall Weighted Average (OWA) of two accuracy measures: the Mean Absolute Scaled Error (MASE1) and the symmetric Mean Absolute Percentage Error (sMAPE2). The individual measures are calculated as follows: ℎ 1 2|𝑌𝑡 − 𝑌̂𝑡 | 𝑠𝑀𝐴𝑃𝐸 = ∑ ℎ |𝑌𝑡 | + |𝑌̂𝑡 | 𝑡=1 𝑀𝐴𝑆𝐸 = 1 ℎ ∑ℎ𝑡=1|𝑌𝑡 − 𝑌̂𝑡 | 1 ∑𝑛 |𝑌 | 𝑛 − 𝑚 𝑡=𝑚+1 𝑡 − 𝑌𝑡−𝑚 Where 𝑌𝑡 is the post sample value of the time series at point t, 𝑌̂𝑡 the estimated forecast, h the forecasting horizon and m the frequency of the data (i.e., 12 for monthly series). An example for computing the OWA is presented below using the MASE and sMAPE of the M3 Competition methods: Divide all Errors by that of Naïve 2 to obtain the Relative MASE and the Relative sMAPE 1 R. J. Hyndman, A. B. Koehler (2006). Another look at measures of forecast accuracy. International Journal of Forecasting 22(4), 679-688 2 S. Makridakis, M. Hibon (2000). The M3-Competition: results, conclusions and implications. International Journal of Forecasting, 16 (4), 451-476 2 Compute the OWA by averaging the Relative MASE and the Relative sMAPE as it is shown in the table below Forecasting Method THETA 1.395 Rank (MASE) 1 12.762 Rank (sMAPE) 1 0.834 Rank (OWA) 1 ForecastPro 1.422 2 0.844 13.088 3 0.861 0.852 2 ForcX 1.441 3 0.855 13.130 4 0.864 0.859 3 Comb S-H-D 1.467 6 0.870 13.056 2 0.859 0.865 4 DAMPEN 1.466 5 0.870 13.279 5 0.874 0.872 5 AutoBox2 1.484 7 0.881 13.284 6 0.874 0.877 6 PP-Autocast 1.523 10 0.904 13.600 7 0.895 0.899 7 HOLT 1.507 8 0.894 13.777 9 0.906 0.900 8 B-J auto 1.512 9 0.897 13.819 10 0.909 0.903 9 WINTER 1.544 15 0.916 13.719 8 0.903 0.909 10 Auto-ANN 1.530 11 0.908 13.921 12 0.916 0.912 11 ARARMA 1.531 12 0.909 13.981 14 0.920 0.914 12 Flors-Pearc1 1.549 16 0.919 13.963 13 0.919 0.919 13 ROBUSTTrend SMARTFCS 1.537 13 0.912 14.098 15 0.927 0.920 14 1.457 4 0.864 15.390 21 1.012 0.938 15 AutoBox3 1.633 19 0.969 13.913 11 0.915 0.942 16 THETAsm 1.594 18 0.946 14.286 16 0.940 0.943 17 AutoBox1 1.540 14 0.914 14.843 18 0.976 0.945 18 RBF 1.574 17 0.934 15.464 22 1.017 0.976 19 Flors-Pearc2 1.665 21 0.988 14.742 17 0.970 0.979 20 Single 1.659 20 0.985 14.881 19 0.979 0.982 21 Naïve 2 1.685 22 1.000 15.201 20 1.000 1.000 22 Naïve 1 1.787 23 1.060 15.701 23 1.033 1.047 23 MASE Relative MASE 0.827 sMAPE Relative sMAPE 0.840 OWA Note that MASE and sMAPE are first estimated per series by averaging the error computed per forecasting horizon and then averaged again across the 3003 time series to compute their value for the whole dataset. On the other hand, OWA is computed only once at the end for the whole sample, as shown in the Table above. In the above example, the most accurate method with the smallest OWA, that would have won the first prize, is Theta; the second most accurate one is ForecastPro, that would have won the second prize, while the third most accurate one is ForcX, that would have won the third prize. The code for computing the OWA is available on GitHub. 2. Student Prize A prize will be awarded to the student of the best performing method according to OWA. 3 3. Full Reproducibility Prize The prerequisite for the Full Reproducibility Prize will be that the code used for generating the forecasts, with the exception of companies providing forecasting services and those claiming proprietary software, will be put on GitHub, not later than 10 days after the end of the competition (i.e., the 10th of June, 2018). In addition, there must be instructions on how to exactly reproduce the M4 submitted forecasts. In this regard, individuals and companies will be able to use the code and the instructions provided, crediting the person/group that has developed them, to improve their organizational forecasts. Companies providing forecasting services and those claiming proprietary software will have to provide the organizers with a detailed description of how their forecasts were made and a source, or execution file for reproducing their forecasts for 100 randomly selected series. Given the critical importance of objectivity and replicability, such description and file will be mandatory for participating in the Competition. An execution file can be submitted in case that the source program needs to be kept confidential, or, alternatively, a source program with a termination date for running it. The code for reproducing the results of the 4Theta method, submitted by the Forecasting & Strategy Unit, was put on GitHub on 21-12-2017. This method will not be considered for any of the Prizes. 4. Prediction Intervals Prize The M4 Competition adopts a 95% Prediction Interval (PI) for estimating the uncertainty around the point forecasts. The performance of the generated PI will be evaluated using the Mean Scaled Interval Score (MSIS3) as follows: 2 2 ℎ 1 ∑𝑡=1(𝑈𝑡 − 𝐿𝑡 ) + 𝑎 (𝐿𝑡 − 𝑌𝑡 )𝟏{𝑌𝑡 < 𝐿𝑡 } + 𝑎 (𝑌𝑡 − 𝑈𝑡 )𝟏{𝑌𝑡 > 𝑈𝑡 } 𝐌𝐒𝐈𝐒 = 1 ℎ ∑𝑛 |𝑌 | 𝑛 − 𝑚 𝑡=𝑚+1 𝑡 − 𝑌𝑡−𝑚 Where L and U are the Lower and Upper bounds of the prediction intervals, 𝑌 are the future observations of the series, 𝑎 is the significance level and 1 is the indicator function (being 1 if Y is within the postulated interval and 0 otherwise). Given that forecasters will be asked to generate 95% prediction intervals, 𝑎 is set to 0.05. An example for computing the MSIS is presented below using the prediction intervals generated by two different methods for 18-step-ahead forecasts: A penalty is calculated for each method at the points where the future values are outside the specified bounds The width of the prediction interval adds up to the penalty, if any, to get the IS. The IS estimated at the individual points are averaged to get the MIS value. MIS is scaled by dividing its value with the mean absolute seasonal difference of the series (here 200). After estimating MSIS for all the M4 Competition series, its average value is computed to evaluate the total performance of the method. 3 T. Gneiting, A. E. Raftery (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102 (477), 359-378. 4 Forecasting Horizon 1 L1 U1 L2 U2 Y Penalty1 Penalty2 IS1 IS2 289 938 297 865 654 0 0 649 568 2 266 923 304 873 492 0 0 657 569 3 313 992 312 880 171 5680 5640 6359 6208 4 238 949 319 888 342 0 0 711 569 5 224 1008 327 895 591 0 0 784 568 6 209 1014 334 903 672 0 0 805 569 7 206 1040 342 910 465 0 0 834 568 8 175 1041 349 918 255 0 3760 866 4329 9 164 1067 357 926 864 0 0 903 569 10 150 1078 364 933 768 0 0 928 569 11 138 1094 372 941 672 0 0 956 569 12 120 1104 379 948 519 0 0 984 569 13 109 1121 387 956 519 0 0 1012 569 14 96 1133 395 963 591 0 0 1037 568 15 83 1146 402 971 480 0 0 1063 569 16 70 1157 410 978 564 0 0 1087 568 17 58 1170 417 986 579 0 0 1112 569 18 46 1182 425 993 423 0 80 1136 648 MIS 1216 1095 MSIS 6.08 5.48 Forecasting Horizons The number of forecasts required by each method is 6 for yearly data, 8 for quarterly, 18 for monthly, 13 for weekly, 14 for daily and 48 for hourly. The accuracy measures are computed for each horizon separately and then combined to cover, in a weighted fashion, all horizons together for each of the two accuracy measures (MASE and sMAPE). The dataset The M4 consists of 100,000 time series of Yearly, Quarterly, Monthly and Other (Weekly, Daily and Hourly) data. The minimum number of observations is 13 for yearly, 16 for quarterly, 42 for monthly, 80 for weekly, 93 for daily and 700 for hourly series. The 100,000 time series of the dataset come mainly from the Economic, Finance, Demographics and Industry areas, while also including data from Tourism, Trade, Labor and Wage, Real Estate, Transportation, Natural Resources and the Environment. The M4 Competition series, as those of the M-1 and M-3, aim at representing the real world as much as possible. The series were selected randomly from a database of 900,000 ones on December 28, 2017. 5 Professor Makridakis chose the seed number for generating the random sample that determined the M4 Competition data. Some pre-defined filters were applied beforehand to achieve some desired characteristics, such as the length of the series, the percentage of Yearly, Quarterly, Monthly, Weekly, Daily, and Hourly data, as well as their type (Micro, Macro, Finance, Industry, Demographic, Other). Below is the number of time series based on their frequency and type: Frequency Yearly Demographic Finance Industry Macro 1,088 6,519 3,716 3,903 6,538 1,236 23,000 1,858 5,728 5,305 10,987 4,637 10,017 5,315 10,016 6,020 10,975 865 277 24,000 48,000 Weekly 24 164 6 41 112 12 359 Daily 10 1,559 422 127 1,476 633 4,227 Quarterly Monthly Hourly Total Micro Other Total 0 0 0 0 0 414 414 8,708 24,534 18,798 19,402 25,121 3,437 100,000 You can download the dataset here. There you may also find additional information regarding the type, the frequency and the number of forecasts required per series. In brief, the M4-Info.csv file provides the following information: M4id: The id of the time series. This is used as a reference. For instance, “Y100” corresponds to the 100th series of the Yearly data. Category: The type of the time series (e.g. Macro, Micro, Financial etc.) Frequency: The frequency of the time series considered. This corresponds to the m value used for estimating MASE. Note that this does not mean that different or multiple seasonality cannot be considered by the competitors. Horizon: The forecasting horizon, i.e., the number of periods ahead for which the competitors need to generate forecasts. SP: The Seasonal Period (e.g. Yearly, Monthly, Weekly etc.) The M4DataSet.rar file contains the historical data for training a forecasting model. A separate file is given per data frequency. The first row displays the M4id, while the rest contain the historical data. No timestamp is provided. The Benchmarks There will be ten benchmark methods, eight used in the M3 Competition and two extra ones based on ML concepts. As these methods are well known, readily available and straightforward to apply, the accuracy of the new ones proposed in the M4 Competition must provide superior accuracy in order to be adopted and used in practice (taking also into account the computational time it would be required to utilize a more accurate method versus the benchmarks whose computational requirements are minimal). 1. Naïve 1 Ft+I = Yt i = 1, 2, 3, … , m 2. Seasonal Naïve Forecasts are equal to the last known observation of the same period. 3. Naïve 2 like Naïve 1 but the data is seasonally adjusted, if needed, by applying classical multiplicative decomposition (R stats package). A 90% autocorrelation test is performed, when using the R package, to decide whether the data is seasonal. 4. Simple Exponential Smoothing (S) (ses() function from v8.2 of the forecast package for R ). Seasonality is considered like in Naïve 2. 6 5. Holt’s Exponential Smoothing (H) (holt() function from v8.2 of the forecast package for R ). Seasonality is considered like in Naïve 2. 6. Dampen Exponential Smoothing (D) (holt() function from v8.2 of the forecast package for R ). Seasonality is considered like in Naïve 2. 7. Combining S-H-D The arithmetic average of methods 4, 5 and 6. 8. Theta As applied to the M3 competition data. (θ=2, seasonal adjustments like in Naïve 2, and SES applied using the ses() function from v8.2 of the forecast package for R). 9. MLP A perceptron of a very basic architecture and parameterization (developed in Python using the Scikit library v0.19.1 - available on GitHub) 10. RNN A recurrent network of a very basic architecture and parameterization (developed in Python using the Keras v2.0.9 and TensorFlow v1.4.0 libraries - available on GitHub) The code for generating the forecasts of the benchmarks mentioned above is available on GitHub. Note that the benchmarks are not eligible for a prize, meaning that the total amount of prizes will be distributed among the competing participants even if some benchmark could perform better than the forecasts submitted by the participants. Factors Affecting Forecasting Accuracy The M4 would provide a unique opportunity to identify the factors affecting forecasting accuracy. Having 100,000 series, with an average of 12 forecasts for each, more than 100 forecasting methods and 2 accuracy measures would result in about a quarter of a billion data points. Data analytics will be applied to discover patterns and relationships, exploiting the findings to enrich our understanding of forecasting accuracy and the factors that affect it. 7
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 7 Language : en-US Tagged PDF : Yes Author : Spyros Makridakis Creator : Microsoft® Word 2013 Create Date : 2018:03:16 12:37:26+02:00 Modify Date : 2018:03:16 12:37:26+02:00 Producer : Microsoft® Word 2013EXIF Metadata provided by EXIF.tools