Handout Instructions

Handout%20instructions-checkpoint

Handout%20instructions

User Manual:

Open the PDF directly: View PDF .
Page Count: 10

RMOTR Data Science - Final

project

Analyzing FIFA 19 player dataset

We'll FIFA 19 Dataset from Kaggle

(https://www.kaggle.com/karangadiya/ﬁfa19/) which includes thorough

information about the FIFA19 game. The data was scraped from https://soﬁfa.com/ (https://soﬁfa.com/).

There are +80 features per player, including attributes (eg: shooting, passing, defending skills, value & wage,

release clause and others).

For a detailed description of the columns of the datasetcheck the associated notebook

Columns detail.ipynb

This is an example of the page that was scraped to generate the data: https://soﬁfa.com/player/158023

(https://soﬁfa.com/player/158023)

Dataset

The data is contained in /data/fifa19.zip . There are 89 columns in the dataset. There's a detailed

description of it in the associated notebook Handout instructions.ipynb .

Initial cleaning

This dataset is fairly well structured and cleaned, but there are still some tasks to do. We'll give you a few

initial pointers to get things started, and you can ﬁnish the process according to your own criteria.

1. Parse Value , Wage and Release Clause to make them numeric:

These ﬁelds have a "human" format (eg: €226.5M . Your job is to turn them into numeric ﬁelds. M means

Millions and K thousands. You're in charge of coming to the real value. Example, €226.5M is actually

226.5 * 1_000_000 = 226_500_000.0 .

2. Create a new column SimplifiedPosition :

The SimplifiedPosition column should have the position of the player simpliﬁed into the possible

values:

Goalkeeper: GK

Defender: LWB, RWB, LB, LCB, CB, RCB, RB

Midﬁelder: LAM, CAM, RAM, LM, LCM, CM, RCM, RM, LDM, CDM, RDM

Attacker: LS, ST, RS, LW, LF, CF, RF, RW

According to our calculations, there should be 2025 GKs, 5866 DFs, 6838 MFs, 3418 ATs. If your numbers

are diﬀerent, please explain why.

3. Parse Joined and Contract Valid Until to Timestamps

Remember to use the pandas function pd.to_datetime() .

4. The rest is for you to decide

There might be more things to clean from this dataset. If you ﬁnd some other things to improve, please

document them to present it to the rest of your classmates.

EDA: Exploratory Data Analysis

The analysis phase will include a couple of predeﬁned questions given by us, as a way to get things started.

We estimate these predeﬁned questions should take around 30% of your time, the other 70% is more

"exploratory", in which you're in charge of coming up with questions, ﬁndings and conclusions.

1. Which are the 10 highest paid players

To kick things oﬀ, we'll start with a simple one. Identify and plot the top 10 highest paid players, deﬁned by

Wage . Here's an illustrative example:

2. Value vs Wage

Some players have a low market value, but they have very good salaries, for some others is the other way

around. Some clubs will prefer to pay a high value, but have less weekly wages.

Create a regression plot (https://seaborn.pydata.org/generated/seaborn.regplot.html) comparing Value to

Wage. Can you tell which players are "cheaper" compared to their salary or the other way around?

3. Value vs Overall

Create a ﬁgure to relate the value of a player with it's "overall rating". You can use a simple scatter plot or

more advanced analysis as the one below:

4. What are the top 5 most "valuable" nations?

What are the top 5 nations for which the sum of players value is the highest?

5. What are the top 5 most "valuable" clubs?

Calculate the total value of each club, deﬁned as the sum of each players total value. Display the top 5.

6. Most valuable clubs on the 90th percentile.

Instead of calculating the total value of the club, we'll explore the 90th percentiles of values. Display the top

5 clubs with highest salaries in the 90th percentile:

7. What players have the highest potential to grow?

The column Potential shows the Maximum overall rating a player can reach (with the correct training).

Overall shows the current rating. We'll deﬁne "Growth Potential" as Potential - Overall . For

example, Vinícius Júnior has an Overall of 77, but a potential of 92. He has 15 points to grow. What are the

10 most promising players? Deﬁned as the ones with highest "Growth potential".

8. What are the top 5 most promising players, with a potential over 90?

Similar to the previous point, but the players with a potential over 90.

9. What clubs have the most number of loaned players?

Some clubs loan more than others. What are the top 10 clubs with more loaned players?

10. Create a Radar chart with the most important features of the player.

A Radar or Spider chart (https://en.wikipedia.org/wiki/Radar_chart) lets you plot multiple features at the

same time. Deﬁne a function plot_radar(player_names, features) that receives players and

features to plot and create radars:

11. Create a correlation heatmap of the most important skills of players.

Your EDA

 ML & Predictions

Time for some Machine Learning! We'll give you a recommendation of the easiest thing to predict, which is

the Overall value of a player. If you want to create other models (example: classifying the

SimplifiedPosition , or estimating the Value/Wage), you're welcome to do it.

Predicting player's Overall (regression)

Create a model that predicts the overall ranking of a player. What are the most relevant variables when

comes to predicting that overall value and why?

Optional

Finally, if you have extra time, here are optional points to work on:

1. Backﬁll missing positions (scraping)

Some players have their Position missing. The https://soﬁfa.com/ (https://soﬁfa.com/) website has

positions for them, so it's probably just the result of poorly scraping. Use your scraping techniques

(beautifulsoup recommended) to ﬁll those missing positions.

2. Create an API endpoint to predict a players overall

Using the regression created in the previous point, create a simple API endpoint that receives the features

you're analyzing and predicts the overall value of the players.

Handout Instructions

Handout%20instructions-checkpoint

Handout%20instructions-checkpoint

Handout%20instructions

Navigation menu

Versions of this User Manual:

Views

Navigation