M4 Competitors Guide

User Manual:

Open the PDF directly: View PDF .
Page Count: 7

Download
Open PDF In Browser	View PDF

Competitor’s Guide: Prizes and Rules

Contents
The Prizes ...................................................................................................................................................... 2
1.

Three Major Prizes ............................................................................................................................ 2

Student Prize ..................................................................................................................................... 3

Full Reproducibility Prize................................................................................................................... 4

Prediction Intervals Prize .................................................................................................................. 4

Forecasting Horizons..................................................................................................................................... 5
The dataset ................................................................................................................................................... 5
The Benchmarks............................................................................................................................................ 6
Factors Affecting Forecasting Accuracy ........................................................................................................ 7

The Prizes
There will be six Prizes awarded to the winners of the M4 Competition. The exact cash amounts to be
granted (at present standing at 27,000€) will depend on securing additional sponsors, announced later.
Proportionally, the total amount of 20,000€ generously provided by the University of Nicosia will be
distributed as follows:
Prize
1st Prize
2nd Prize
3rd Prize
Prediction Intervals Prize
The UBER Student Prize
The Amazon Prize

Description
Best performing method according to OWA
Second-best performing method according to OWA
Third-best performing method according to OWA
Best performing method according to MSIS
Best performing method among student
competitors according to OWA
The best reproducible forecasting method
according to OWA

Percentage (%)
45
20
10
25
5,000€
2,000€

Additionally, the global taxi technology company UBER will generously award a special Student Prize of
5,000€ to the student with the most accurate forecasting method according to OWA and Amazon will
generously award 2,000€ for the best Reproducible forecasting method.
There are no restrictions in collecting more than one prize.

1. Three Major Prizes
There will be three major Prizes for the First, Second and Third winner of the competition who will be
selected based on the performance of the participating methods according to the Overall Weighted
Average (OWA) of two accuracy measures: the Mean Absolute Scaled Error (MASE1) and the symmetric
Mean Absolute Percentage Error (sMAPE2). The individual measures are calculated as follows:
ℎ

1
2|𝑌𝑡 − 𝑌̂𝑡 |
𝑠𝑀𝐴𝑃𝐸 = ∑
ℎ
|𝑌𝑡 | + |𝑌̂𝑡 |
𝑡=1

𝑀𝐴𝑆𝐸 =

1
ℎ

∑ℎ𝑡=1|𝑌𝑡 − 𝑌̂𝑡 |
1
∑𝑛
|𝑌
|
𝑛 − 𝑚 𝑡=𝑚+1 𝑡 − 𝑌𝑡−𝑚

Where 𝑌𝑡 is the post sample value of the time series at point t, 𝑌̂𝑡 the estimated forecast, h the
forecasting horizon and m the frequency of the data (i.e., 12 for monthly series).
An example for computing the OWA is presented below using the MASE and sMAPE of the M3
Competition methods:
 Divide all Errors by that of Naïve 2 to obtain the Relative MASE and the Relative sMAPE
1

R. J. Hyndman, A. B. Koehler (2006). Another look at measures of forecast accuracy. International Journal of
Forecasting 22(4), 679-688
2
S. Makridakis, M. Hibon (2000). The M3-Competition: results, conclusions and implications. International Journal
of Forecasting, 16 (4), 451-476

 Compute the OWA by averaging the Relative MASE and the Relative sMAPE as it is shown in the
table below

Forecasting
Method
THETA

1.395

Rank
(MASE)
1

12.762

Rank
(sMAPE)
1

0.834

Rank
(OWA)
1

ForecastPro

1.422

0.844

13.088

0.861

0.852

ForcX

1.441

0.855

13.130

0.864

0.859

Comb S-H-D

1.467

0.870

13.056

0.859

0.865

DAMPEN

1.466

0.870

13.279

0.874

0.872

AutoBox2

1.484

0.881

13.284

0.874

0.877

PP-Autocast

1.523

0.904

13.600

0.895

0.899

HOLT

1.507

0.894

13.777

0.906

0.900

B-J auto

1.512

0.897

13.819

0.909

0.903

WINTER

1.544

0.916

13.719

0.903

0.909

Auto-ANN

1.530

0.908

13.921

0.916

0.912

ARARMA

1.531

0.909

13.981

0.920

0.914

Flors-Pearc1

1.549

0.919

13.963

0.919

ROBUSTTrend
SMARTFCS

1.537

0.912

14.098

0.927

0.920

1.457

0.864

15.390

1.012

0.938

AutoBox3

1.633

0.969

13.913

0.915

0.942

THETAsm

1.594

0.946

14.286

0.940

0.943

AutoBox1

1.540

0.914

14.843

0.976

0.945

RBF

1.574

0.934

15.464

1.017

0.976

Flors-Pearc2

1.665

0.988

14.742

0.970

0.979

Single

1.659

0.985

14.881

0.979

0.982

Naïve 2

1.685

1.000

15.201

1.000

Naïve 1

1.787

1.060

15.701

1.033

1.047

MASE

Relative
MASE
0.827

sMAPE

Relative
sMAPE
0.840

OWA

Note that MASE and sMAPE are first estimated per series by averaging the error computed per forecasting
horizon and then averaged again across the 3003 time series to compute their value for the whole dataset.
On the other hand, OWA is computed only once at the end for the whole sample, as shown in the Table
above.
In the above example, the most accurate method with the smallest OWA, that would have won the first
prize, is Theta; the second most accurate one is ForecastPro, that would have won the second prize, while
the third most accurate one is ForcX, that would have won the third prize.
The code for computing the OWA is available on GitHub.

2. Student Prize
A prize will be awarded to the student of the best performing method according to OWA.

3. Full Reproducibility Prize
The prerequisite for the Full Reproducibility Prize will be that the code used for generating the forecasts,
with the exception of companies providing forecasting services and those claiming proprietary software,
will be put on GitHub, not later than 10 days after the end of the competition (i.e., the 10th of June, 2018).
In addition, there must be instructions on how to exactly reproduce the M4 submitted forecasts. In this
regard, individuals and companies will be able to use the code and the instructions provided, crediting the
person/group that has developed them, to improve their organizational forecasts.
Companies providing forecasting services and those claiming proprietary software will have to provide
the organizers with a detailed description of how their forecasts were made and a source, or execution
file for reproducing their forecasts for 100 randomly selected series. Given the critical importance of
objectivity and replicability, such description and file will be mandatory for participating in the Competition.
An execution file can be submitted in case that the source program needs to be kept confidential, or,
alternatively, a source program with a termination date for running it.
The code for reproducing the results of the 4Theta method, submitted by the Forecasting & Strategy Unit,
was put on GitHub on 21-12-2017. This method will not be considered for any of the Prizes.

4. Prediction Intervals Prize
The M4 Competition adopts a 95% Prediction Interval (PI) for estimating the uncertainty around the point
forecasts. The performance of the generated PI will be evaluated using the Mean Scaled Interval Score
(MSIS3) as follows:
2
2
ℎ
1 ∑𝑡=1(𝑈𝑡 − 𝐿𝑡 ) + 𝑎 (𝐿𝑡 − 𝑌𝑡 )𝟏{𝑌𝑡 < 𝐿𝑡 } + 𝑎 (𝑌𝑡 − 𝑈𝑡 )𝟏{𝑌𝑡 > 𝑈𝑡 }
𝐌𝐒𝐈𝐒 =
1
ℎ
∑𝑛
|𝑌
|
𝑛 − 𝑚 𝑡=𝑚+1 𝑡 − 𝑌𝑡−𝑚
Where L and U are the Lower and Upper bounds of the prediction intervals, 𝑌 are the future observations
of the series, 𝑎 is the significance level and 1 is the indicator function (being 1 if Y is within the postulated
interval and 0 otherwise). Given that forecasters will be asked to generate 95% prediction intervals, 𝑎 is
set to 0.05.
An example for computing the MSIS is presented below using the prediction intervals generated by two
different methods for 18-step-ahead forecasts:
 A penalty is calculated for each method at the points where the future values are outside the
specified bounds
 The width of the prediction interval adds up to the penalty, if any, to get the IS.
 The IS estimated at the individual points are averaged to get the MIS value.
 MIS is scaled by dividing its value with the mean absolute seasonal difference of the series (here
200).
 After estimating MSIS for all the M4 Competition series, its average value is computed to evaluate
the total performance of the method.

T. Gneiting, A. E. Raftery (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American
Statistical Association, 102 (477), 359-378.

Forecasting
Horizon
1

Penalty1

Penalty2

IS1

IS2

289

938

297

865

654

649

568

266

923

304

873

492

657

569

313

992

312

880

171

5680

5640

6359

6208

238

949

319

888

342

711

569

224

1008

327

895

591

784

568

209

1014

334

903

672

805

569

206

1040

342

910

465

834

568

175

1041

349

918

255

3760

866

4329

164

1067

357

926

864

903

569

150

1078

364

933

768

928

569

138

1094

372

941

672

956

569

120

1104

379

948

519

984

569

109

1121

387

956

519

1012

569

1133

395

963

591

1037

568

1146

402

971

480

1063

569

1157

410

978

564

1087

568

1170

417

986

579

1112

569

1182

425

993

423

1136

648

MIS

1216

1095

MSIS

6.08

5.48

Forecasting Horizons
The number of forecasts required by each method is 6 for yearly data, 8 for quarterly, 18 for monthly, 13
for weekly, 14 for daily and 48 for hourly. The accuracy measures are computed for each horizon
separately and then combined to cover, in a weighted fashion, all horizons together for each of the two
accuracy measures (MASE and sMAPE).

The dataset
The M4 consists of 100,000 time series of Yearly, Quarterly, Monthly and Other (Weekly, Daily and Hourly)
data. The minimum number of observations is 13 for yearly, 16 for quarterly, 42 for monthly, 80 for
weekly, 93 for daily and 700 for hourly series.
The 100,000 time series of the dataset come mainly from the Economic, Finance, Demographics and
Industry areas, while also including data from Tourism, Trade, Labor and Wage, Real Estate,
Transportation, Natural Resources and the Environment.
The M4 Competition series, as those of the M-1 and M-3, aim at representing the real world as much as
possible. The series were selected randomly from a database of 900,000 ones on December 28, 2017.
5

Professor Makridakis chose the seed number for generating the random sample that determined the M4
Competition data. Some pre-defined filters were applied beforehand to achieve some desired
characteristics, such as the length of the series, the percentage of Yearly, Quarterly, Monthly, Weekly,
Daily, and Hourly data, as well as their type (Micro, Macro, Finance, Industry, Demographic, Other).
Below is the number of time series based on their frequency and type:
Frequency
Yearly

Demographic

Finance

Industry

Macro

1,088

6,519

3,716

3,903

6,538

1,236

23,000

1,858
5,728

5,305
10,987

4,637
10,017

5,315
10,016

6,020
10,975

865
277

24,000
48,000

Weekly

164

112

359

Daily

1,559

422

127

1,476

633

4,227

Quarterly
Monthly

Hourly
Total

Micro

Other

Total

414

8,708

24,534

18,798

19,402

25,121

3,437

100,000

You can download the dataset here. There you may also find additional information regarding the type,
the frequency and the number of forecasts required per series.
In brief, the M4-Info.csv file provides the following information:






M4id: The id of the time series. This is used as a reference. For instance, “Y100” corresponds to the
100th series of the Yearly data.
Category: The type of the time series (e.g. Macro, Micro, Financial etc.)
Frequency: The frequency of the time series considered. This corresponds to the m value used for
estimating MASE. Note that this does not mean that different or multiple seasonality cannot be
considered by the competitors.
Horizon: The forecasting horizon, i.e., the number of periods ahead for which the competitors need to
generate forecasts.
SP: The Seasonal Period (e.g. Yearly, Monthly, Weekly etc.)

The M4DataSet.rar file contains the historical data for training a forecasting model. A separate file is given
per data frequency. The first row displays the M4id, while the rest contain the historical data. No timestamp is provided.

The Benchmarks
There will be ten benchmark methods, eight used in the M3 Competition and two extra ones based on ML
concepts. As these methods are well known, readily available and straightforward to apply, the accuracy
of the new ones proposed in the M4 Competition must provide superior accuracy in order to be adopted
and used in practice (taking also into account the computational time it would be required to utilize a
more accurate method versus the benchmarks whose computational requirements are minimal).
1. Naïve 1 Ft+I = Yt i = 1, 2, 3, … , m
2. Seasonal Naïve Forecasts are equal to the last known observation of the same period.
3. Naïve 2 like Naïve 1 but the data is seasonally adjusted, if needed, by applying classical
multiplicative decomposition (R stats package). A 90% autocorrelation test is performed,
when using the R package, to decide whether the data is seasonal.
4. Simple Exponential Smoothing (S) (ses() function from v8.2 of the forecast package for R ).
Seasonality is considered like in Naïve 2.
6

5. Holt’s Exponential Smoothing (H) (holt() function from v8.2 of the forecast package for R ).
Seasonality is considered like in Naïve 2.
6. Dampen Exponential Smoothing (D) (holt() function from v8.2 of the forecast package for R ).
Seasonality is considered like in Naïve 2.
7. Combining S-H-D The arithmetic average of methods 4, 5 and 6.
8. Theta As applied to the M3 competition data. (θ=2, seasonal adjustments like in Naïve 2, and SES
applied using the ses() function from v8.2 of the forecast package for R).
9. MLP A perceptron of a very basic architecture and parameterization (developed in Python using
the Scikit library v0.19.1 - available on GitHub)

10. RNN A recurrent network of a very basic architecture and parameterization (developed in
Python using the Keras v2.0.9 and TensorFlow v1.4.0 libraries - available on GitHub)

The code for generating the forecasts of the benchmarks mentioned above is available on GitHub.
Note that the benchmarks are not eligible for a prize, meaning that the total amount of prizes will be
distributed among the competing participants even if some benchmark could perform better than the
forecasts submitted by the participants.

Factors Affecting Forecasting Accuracy
The M4 would provide a unique opportunity to identify the factors affecting forecasting accuracy. Having
100,000 series, with an average of 12 forecasts for each, more than 100 forecasting methods and 2 accuracy
measures would result in about a quarter of a billion data points. Data analytics will be applied to discover
patterns and relationships, exploiting the findings to enrich our understanding of forecasting accuracy and the
factors that affect it.

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 7
Language                        : en-US
Tagged PDF                      : Yes
Author                          : Spyros Makridakis
Creator                         : Microsoft® Word 2013
Create Date                     : 2018:03:16 12:37:26+02:00
Modify Date                     : 2018:03:16 12:37:26+02:00
Producer                        : Microsoft® Word 2013

EXIF Metadata provided by EXIF.tools

M4 Competitors Guide

Navigation menu

Versions of this User Manual:

Views

Navigation