M4 Competitors Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 7

DownloadM4-Competitors-Guide
Open PDF In BrowserView PDF
Competitor’s Guide: Prizes and Rules

Contents
The Prizes ...................................................................................................................................................... 2
1.

Three Major Prizes ............................................................................................................................ 2

2.

Student Prize ..................................................................................................................................... 3

3.

Full Reproducibility Prize................................................................................................................... 4

4.

Prediction Intervals Prize .................................................................................................................. 4

Forecasting Horizons..................................................................................................................................... 5
The dataset ................................................................................................................................................... 5
The Benchmarks............................................................................................................................................ 6
Factors Affecting Forecasting Accuracy ........................................................................................................ 7

1

The Prizes
There will be six Prizes awarded to the winners of the M4 Competition. The exact cash amounts to be
granted (at present standing at 27,000€) will depend on securing additional sponsors, announced later.
Proportionally, the total amount of 20,000€ generously provided by the University of Nicosia will be
distributed as follows:
Prize
1st Prize
2nd Prize
3rd Prize
Prediction Intervals Prize
The UBER Student Prize
The Amazon Prize

Description
Best performing method according to OWA
Second-best performing method according to OWA
Third-best performing method according to OWA
Best performing method according to MSIS
Best performing method among student
competitors according to OWA
The best reproducible forecasting method
according to OWA

Percentage (%)
45
20
10
25
5,000€
2,000€

Additionally, the global taxi technology company UBER will generously award a special Student Prize of
5,000€ to the student with the most accurate forecasting method according to OWA and Amazon will
generously award 2,000€ for the best Reproducible forecasting method.
There are no restrictions in collecting more than one prize.

1. Three Major Prizes
There will be three major Prizes for the First, Second and Third winner of the competition who will be
selected based on the performance of the participating methods according to the Overall Weighted
Average (OWA) of two accuracy measures: the Mean Absolute Scaled Error (MASE1) and the symmetric
Mean Absolute Percentage Error (sMAPE2). The individual measures are calculated as follows:
ℎ

1
2|𝑌𝑡 − 𝑌̂𝑡 |
𝑠𝑀𝐴𝑃𝐸 = ∑
ℎ
|𝑌𝑡 | + |𝑌̂𝑡 |
𝑡=1

𝑀𝐴𝑆𝐸 =

1
ℎ

∑ℎ𝑡=1|𝑌𝑡 − 𝑌̂𝑡 |
1
∑𝑛
|𝑌
|
𝑛 − 𝑚 𝑡=𝑚+1 𝑡 − 𝑌𝑡−𝑚

Where 𝑌𝑡 is the post sample value of the time series at point t, 𝑌̂𝑡 the estimated forecast, h the
forecasting horizon and m the frequency of the data (i.e., 12 for monthly series).
An example for computing the OWA is presented below using the MASE and sMAPE of the M3
Competition methods:
 Divide all Errors by that of Naïve 2 to obtain the Relative MASE and the Relative sMAPE
1

R. J. Hyndman, A. B. Koehler (2006). Another look at measures of forecast accuracy. International Journal of
Forecasting 22(4), 679-688
2
S. Makridakis, M. Hibon (2000). The M3-Competition: results, conclusions and implications. International Journal
of Forecasting, 16 (4), 451-476

2

 Compute the OWA by averaging the Relative MASE and the Relative sMAPE as it is shown in the
table below

Forecasting
Method
THETA

1.395

Rank
(MASE)
1

12.762

Rank
(sMAPE)
1

0.834

Rank
(OWA)
1

ForecastPro

1.422

2

0.844

13.088

3

0.861

0.852

2

ForcX

1.441

3

0.855

13.130

4

0.864

0.859

3

Comb S-H-D

1.467

6

0.870

13.056

2

0.859

0.865

4

DAMPEN

1.466

5

0.870

13.279

5

0.874

0.872

5

AutoBox2

1.484

7

0.881

13.284

6

0.874

0.877

6

PP-Autocast

1.523

10

0.904

13.600

7

0.895

0.899

7

HOLT

1.507

8

0.894

13.777

9

0.906

0.900

8

B-J auto

1.512

9

0.897

13.819

10

0.909

0.903

9

WINTER

1.544

15

0.916

13.719

8

0.903

0.909

10

Auto-ANN

1.530

11

0.908

13.921

12

0.916

0.912

11

ARARMA

1.531

12

0.909

13.981

14

0.920

0.914

12

Flors-Pearc1

1.549

16

0.919

13.963

13

0.919

0.919

13

ROBUSTTrend
SMARTFCS

1.537

13

0.912

14.098

15

0.927

0.920

14

1.457

4

0.864

15.390

21

1.012

0.938

15

AutoBox3

1.633

19

0.969

13.913

11

0.915

0.942

16

THETAsm

1.594

18

0.946

14.286

16

0.940

0.943

17

AutoBox1

1.540

14

0.914

14.843

18

0.976

0.945

18

RBF

1.574

17

0.934

15.464

22

1.017

0.976

19

Flors-Pearc2

1.665

21

0.988

14.742

17

0.970

0.979

20

Single

1.659

20

0.985

14.881

19

0.979

0.982

21

Naïve 2

1.685

22

1.000

15.201

20

1.000

1.000

22

Naïve 1

1.787

23

1.060

15.701

23

1.033

1.047

23

MASE

Relative
MASE
0.827

sMAPE

Relative
sMAPE
0.840

OWA

Note that MASE and sMAPE are first estimated per series by averaging the error computed per forecasting
horizon and then averaged again across the 3003 time series to compute their value for the whole dataset.
On the other hand, OWA is computed only once at the end for the whole sample, as shown in the Table
above.
In the above example, the most accurate method with the smallest OWA, that would have won the first
prize, is Theta; the second most accurate one is ForecastPro, that would have won the second prize, while
the third most accurate one is ForcX, that would have won the third prize.
The code for computing the OWA is available on GitHub.

2. Student Prize
A prize will be awarded to the student of the best performing method according to OWA.

3

3. Full Reproducibility Prize
The prerequisite for the Full Reproducibility Prize will be that the code used for generating the forecasts,
with the exception of companies providing forecasting services and those claiming proprietary software,
will be put on GitHub, not later than 10 days after the end of the competition (i.e., the 10th of June, 2018).
In addition, there must be instructions on how to exactly reproduce the M4 submitted forecasts. In this
regard, individuals and companies will be able to use the code and the instructions provided, crediting the
person/group that has developed them, to improve their organizational forecasts.
Companies providing forecasting services and those claiming proprietary software will have to provide
the organizers with a detailed description of how their forecasts were made and a source, or execution
file for reproducing their forecasts for 100 randomly selected series. Given the critical importance of
objectivity and replicability, such description and file will be mandatory for participating in the Competition.
An execution file can be submitted in case that the source program needs to be kept confidential, or,
alternatively, a source program with a termination date for running it.
The code for reproducing the results of the 4Theta method, submitted by the Forecasting & Strategy Unit,
was put on GitHub on 21-12-2017. This method will not be considered for any of the Prizes.

4. Prediction Intervals Prize
The M4 Competition adopts a 95% Prediction Interval (PI) for estimating the uncertainty around the point
forecasts. The performance of the generated PI will be evaluated using the Mean Scaled Interval Score
(MSIS3) as follows:
2
2
ℎ
1 ∑𝑡=1(𝑈𝑡 − 𝐿𝑡 ) + 𝑎 (𝐿𝑡 − 𝑌𝑡 )𝟏{𝑌𝑡 < 𝐿𝑡 } + 𝑎 (𝑌𝑡 − 𝑈𝑡 )𝟏{𝑌𝑡 > 𝑈𝑡 }
𝐌𝐒𝐈𝐒 =
1
ℎ
∑𝑛
|𝑌
|
𝑛 − 𝑚 𝑡=𝑚+1 𝑡 − 𝑌𝑡−𝑚
Where L and U are the Lower and Upper bounds of the prediction intervals, 𝑌 are the future observations
of the series, 𝑎 is the significance level and 1 is the indicator function (being 1 if Y is within the postulated
interval and 0 otherwise). Given that forecasters will be asked to generate 95% prediction intervals, 𝑎 is
set to 0.05.
An example for computing the MSIS is presented below using the prediction intervals generated by two
different methods for 18-step-ahead forecasts:
 A penalty is calculated for each method at the points where the future values are outside the
specified bounds
 The width of the prediction interval adds up to the penalty, if any, to get the IS.
 The IS estimated at the individual points are averaged to get the MIS value.
 MIS is scaled by dividing its value with the mean absolute seasonal difference of the series (here
200).
 After estimating MSIS for all the M4 Competition series, its average value is computed to evaluate
the total performance of the method.

3

T. Gneiting, A. E. Raftery (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American
Statistical Association, 102 (477), 359-378.

4

Forecasting
Horizon
1

L1

U1

L2

U2

Y

Penalty1

Penalty2

IS1

IS2

289

938

297

865

654

0

0

649

568

2

266

923

304

873

492

0

0

657

569

3

313

992

312

880

171

5680

5640

6359

6208

4

238

949

319

888

342

0

0

711

569

5

224

1008

327

895

591

0

0

784

568

6

209

1014

334

903

672

0

0

805

569

7

206

1040

342

910

465

0

0

834

568

8

175

1041

349

918

255

0

3760

866

4329

9

164

1067

357

926

864

0

0

903

569

10

150

1078

364

933

768

0

0

928

569

11

138

1094

372

941

672

0

0

956

569

12

120

1104

379

948

519

0

0

984

569

13

109

1121

387

956

519

0

0

1012

569

14

96

1133

395

963

591

0

0

1037

568

15

83

1146

402

971

480

0

0

1063

569

16

70

1157

410

978

564

0

0

1087

568

17

58

1170

417

986

579

0

0

1112

569

18

46

1182

425

993

423

0

80

1136

648

MIS

1216

1095

MSIS

6.08

5.48

Forecasting Horizons
The number of forecasts required by each method is 6 for yearly data, 8 for quarterly, 18 for monthly, 13
for weekly, 14 for daily and 48 for hourly. The accuracy measures are computed for each horizon
separately and then combined to cover, in a weighted fashion, all horizons together for each of the two
accuracy measures (MASE and sMAPE).

The dataset
The M4 consists of 100,000 time series of Yearly, Quarterly, Monthly and Other (Weekly, Daily and Hourly)
data. The minimum number of observations is 13 for yearly, 16 for quarterly, 42 for monthly, 80 for
weekly, 93 for daily and 700 for hourly series.
The 100,000 time series of the dataset come mainly from the Economic, Finance, Demographics and
Industry areas, while also including data from Tourism, Trade, Labor and Wage, Real Estate,
Transportation, Natural Resources and the Environment.
The M4 Competition series, as those of the M-1 and M-3, aim at representing the real world as much as
possible. The series were selected randomly from a database of 900,000 ones on December 28, 2017.
5

Professor Makridakis chose the seed number for generating the random sample that determined the M4
Competition data. Some pre-defined filters were applied beforehand to achieve some desired
characteristics, such as the length of the series, the percentage of Yearly, Quarterly, Monthly, Weekly,
Daily, and Hourly data, as well as their type (Micro, Macro, Finance, Industry, Demographic, Other).
Below is the number of time series based on their frequency and type:
Frequency
Yearly

Demographic

Finance

Industry

Macro

1,088

6,519

3,716

3,903

6,538

1,236

23,000

1,858
5,728

5,305
10,987

4,637
10,017

5,315
10,016

6,020
10,975

865
277

24,000
48,000

Weekly

24

164

6

41

112

12

359

Daily

10

1,559

422

127

1,476

633

4,227

Quarterly
Monthly

Hourly
Total

Micro

Other

Total

0

0

0

0

0

414

414

8,708

24,534

18,798

19,402

25,121

3,437

100,000

You can download the dataset here. There you may also find additional information regarding the type,
the frequency and the number of forecasts required per series.
In brief, the M4-Info.csv file provides the following information:






M4id: The id of the time series. This is used as a reference. For instance, “Y100” corresponds to the
100th series of the Yearly data.
Category: The type of the time series (e.g. Macro, Micro, Financial etc.)
Frequency: The frequency of the time series considered. This corresponds to the m value used for
estimating MASE. Note that this does not mean that different or multiple seasonality cannot be
considered by the competitors.
Horizon: The forecasting horizon, i.e., the number of periods ahead for which the competitors need to
generate forecasts.
SP: The Seasonal Period (e.g. Yearly, Monthly, Weekly etc.)

The M4DataSet.rar file contains the historical data for training a forecasting model. A separate file is given
per data frequency. The first row displays the M4id, while the rest contain the historical data. No timestamp is provided.

The Benchmarks
There will be ten benchmark methods, eight used in the M3 Competition and two extra ones based on ML
concepts. As these methods are well known, readily available and straightforward to apply, the accuracy
of the new ones proposed in the M4 Competition must provide superior accuracy in order to be adopted
and used in practice (taking also into account the computational time it would be required to utilize a
more accurate method versus the benchmarks whose computational requirements are minimal).
1. Naïve 1 Ft+I = Yt i = 1, 2, 3, … , m
2. Seasonal Naïve Forecasts are equal to the last known observation of the same period.
3. Naïve 2 like Naïve 1 but the data is seasonally adjusted, if needed, by applying classical
multiplicative decomposition (R stats package). A 90% autocorrelation test is performed,
when using the R package, to decide whether the data is seasonal.
4. Simple Exponential Smoothing (S) (ses() function from v8.2 of the forecast package for R ).
Seasonality is considered like in Naïve 2.
6

5. Holt’s Exponential Smoothing (H) (holt() function from v8.2 of the forecast package for R ).
Seasonality is considered like in Naïve 2.
6. Dampen Exponential Smoothing (D) (holt() function from v8.2 of the forecast package for R ).
Seasonality is considered like in Naïve 2.
7. Combining S-H-D The arithmetic average of methods 4, 5 and 6.
8. Theta As applied to the M3 competition data. (θ=2, seasonal adjustments like in Naïve 2, and SES
applied using the ses() function from v8.2 of the forecast package for R).
9. MLP A perceptron of a very basic architecture and parameterization (developed in Python using
the Scikit library v0.19.1 - available on GitHub)

10. RNN A recurrent network of a very basic architecture and parameterization (developed in
Python using the Keras v2.0.9 and TensorFlow v1.4.0 libraries - available on GitHub)

The code for generating the forecasts of the benchmarks mentioned above is available on GitHub.
Note that the benchmarks are not eligible for a prize, meaning that the total amount of prizes will be
distributed among the competing participants even if some benchmark could perform better than the
forecasts submitted by the participants.

Factors Affecting Forecasting Accuracy
The M4 would provide a unique opportunity to identify the factors affecting forecasting accuracy. Having
100,000 series, with an average of 12 forecasts for each, more than 100 forecasting methods and 2 accuracy
measures would result in about a quarter of a billion data points. Data analytics will be applied to discover
patterns and relationships, exploiting the findings to enrich our understanding of forecasting accuracy and the
factors that affect it.

7



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 7
Language                        : en-US
Tagged PDF                      : Yes
Author                          : Spyros Makridakis
Creator                         : Microsoft® Word 2013
Create Date                     : 2018:03:16 12:37:26+02:00
Modify Date                     : 2018:03:16 12:37:26+02:00
Producer                        : Microsoft® Word 2013
EXIF Metadata provided by EXIF.tools

Navigation menu