Evaluation metrics play a critical role in machine learning ecosystem. Especially for machine learning products, evaluation metrics are like the heart beats. They show how healthy the model is and how good it is performing in real life and they are the only numbers that the decision makers care about.

Definition and implementation of evaluation metrics highly depend on the application and it changes from one data product to another one. In this post, I aim to introduce Mean Directional Accuracy (MDA) and how we can calculate it in Spark.

MDA is used in time series prediction where it compares the forecast direction (upward or downward) to the actual realized direction. It is interpreted as the probability that the under study forecasting method can detect the correct direction of the time series. It is highly used metric [1]

in economics applications where the economists is often interested only in directional movement of variable of interest. As an example in macroeconomics, a monetary authority who likes to know the direction of the inflation, to raises interest rates or decrease the rates if inflation is predicted to rise or drop respectively. Another example can be found in financial planning where the user wants to know if the demand has increasing direction or decreasing trend.

For a time series \(A_t\) and its prediction \(F_t\), the Mean Directional Accuracy is calculated as follows:

\[\frac{1}{N} \sum_t \mathbf{1}_{ sign(A_t-A_{t-1}) == sign(F_t-F_{t-1})} \]

\[\frac{1}{N} \sum_t \mathbf{1}_{ sign(A_t-A_{t-1}) == sign(F_t-F_{t-1})} \]

### Toy example

As a toy example, let’s use the GDP time series dataset that shows the GDP of nine major countries from 1950 to 1983 [2]. Moreover, for each country and for a given year the GDP prediction for that year is also available. For example, the following plot shows the yearly actual GDP and the predicted GDP for United States:

The goal is to calculate the MDA metric for GDP prediction per country in Spark.

*Note that the data is not that big that one needs Spark to do the calculation. This is just a toy to show how to use Spark window for such a problem.*By loading data into Spark dataframe we have the following data structure where each row shows the GDP and GDP prediction for each year and each country.```
df.filter(df.country=="USA").show(5)
+-------+--------+-----------------+----+
|country| gdp| predicted|year|
+-------+--------+-----------------+----+
| USA|4.470303|5.012966057409855|1950|
| USA|4.734335|4.404831278549317|1951|
| USA|4.826502|4.978599656728077|1952|
| USA|4.981746|5.035932340179457|1953|
| USA| 4.79081|4.853806067158911|1954|
+-------+--------+-----------------+----+
```

By looking at MDA formulation, we need to partition the dataframe by country and then for each year, compare the GDP direction and its prediction from previous year to year of study. In Spark data frame, spark.sql.window is the tool for this operation [3].

We can divide the MDA metric calculation into 4 steps.

##### Step 1: window configuration

As mentioned before, the partioning is per country. Then we order the data for each partition by

`year`

since we would like to compare each year GDP to previous year.```
from pyspark.sql import window
windowSpec = window.Window.partitionBy('country')\
.orderBy('year')
```

For MDA, we need to know the GDP and its prediction in last year. Therefore, the size of the moving window needs to be only two: the current entry and the one before. This is coded in

`windwoSpec`

as following:```
mywindow = windowSpec.rowsBetween(-1,0)
```

##### Step 2: add last year column to each row

To calculate the direction of GDP and its prediction, for each row of data frame we need to add the previous year GDP and prediction. For that, in each window we only pick the first element (corresponding to last year data) and then bind it to the current row.

```
import pyspark.sql.functions as func
df_w = df.withColumn("gdp_last_year", func.first(df.gdp).over(mywindow))\
.withColumn("predicted_last_year", func.first(df.predicted).over(mywindow))
```

Let’s look at the new dataframe:

```
+-------+--------+-------------+----+-------------+-------------------+
|country| gdp| predicted|year|gdp_last_year|predicted_last_year|
+-------+--------+-------------+----+-------------+-------------------+
| USA|4.470303|5.01296605740|1950| 4.470303| 5.012966057409855|
| USA|4.734335|4.40483127854|1951| 4.470303| 5.012966057409855|
| USA|4.826502|4.97859965672|1952| 4.734335| 4.404831278549317|
| USA|4.981746|5.03593234017|1953| 4.826502| 4.978599656728077|
| USA| 4.79081|4.85380606715|1954| 4.981746| 5.035932340179457|
+-------+--------+-------------+----+-------------+-------------------+
```

##### Step 3: find the sign of GDP and GDP prediction for each year

Now, we need to compare each year GDP and its prediction with previous year and find the direction. Since we have all this information per row in

`df_w`

data frame, this operation can be done per row. We just need to define a Spark `User Defined Function (UDF)`

and apply it to all the rows:```
def f(x):
if x > 0:
return 1.0
else:
return -1.0
sign_udf = func.UserDefinedFunction(lambda x: f(x), types.DoubleType())
df_w = df_w.withColumn("sign_gdp", sign_udf(df_w.gdp - df_w.gdp_last_year))\
.withColumn("sign_predicted", sign_udf(df_w.predicted - df_w.predicted_last_year))
```

The updated

`df_w`

dataframe is as follows:```
df_w.filter(df_w.country=="USA").show(5)
```

```
+-------+--------+-----------------+----+-------------+-------------------+--------+--------------+
|country| gdp| predicted|year|gdp_last_year|predicted_last_year|sign_gdp|sign_predicted|
+-------+--------+-----------------+----+-------------+-------------------+--------+--------------+
| USA|4.470303|5.012966057409855|1950| 4.470303| 5.012966057409855| -1.0| -1.0|
| USA|4.734335|4.404831278549317|1951| 4.470303| 5.012966057409855| 1.0| -1.0|
| USA|4.826502|4.978599656728077|1952| 4.734335| 4.404831278549317| 1.0| 1.0|
| USA|4.981746|5.035932340179457|1953| 4.826502| 4.978599656728077| 1.0| 1.0|
| USA| 4.79081|4.853806067158911|1954| 4.981746| 5.035932340179457| -1.0| -1.0|
+-------+--------+-----------------+----+-------------+-------------------+--------+--------------+
```

##### Step 4: calculate the MDA

Finally, for each country we need to apply the indicator function for each directional prediction and take the average. This can be done as follows:

```
def indicator_function(x):
if x:
return 1.0
else:
return 0.0
## apply indicator function to each directional prediction
mda_udf = func.UserDefinedFunction(lambda x: indicator_function(x), types.DoubleType())
df_w = df_w.withColumn("MDA", mda_udf(df_w.sign_gdp == df_w.sign_predicted))
## MDA calculation
mda_result = df_w.groupBy('country').mean()
mda_result = mda_result.withColumnRenamed("AVG(MDA)", "MDA")\
.select("country","MDA")
```

The MDA result for all nine countries is give below:

```
+-------+------------------+
|country| MDA|
+-------+------------------+
| GREECE|0.7058823529411765|
| UK|0.6470588235294118|
| CANADA|0.6470588235294118|
|AUSTRIA|0.7352941176470589|
| USA|0.7058823529411765|
| ITALY|0.7352941176470589|
|GERMANY|0.7647058823529411|
| SWEDEN|0.7352941176470589|
| FRANCE|0.7941176470588235|
+-------+------------------+
```

Thank you for a great post. I'm working on a file popularity study and I could use your knowledge. Could you please provide me your email address so I can discuss with you? Thanks.

ReplyDeleteGlad to help, but unfortunately due to security reason I can't relieve my email here. I suggest you add me @ LinkedIn (linked in provided on left side of the post) and send me a message there. Thanks

DeleteReally Good blog post.provided a helpful information.I hope that you will post more updates like this Data Science online Training Bangalore

ReplyDeleteHmm, it seems like your site ate my first comment (it was extremely long) so I guess I’ll just sum it up what I had written and say, I’m thoroughly enjoying your blog. I as well as an aspiring blog writer, but I’m still new to the whole thing. Do you have any recommendations for newbie blog writers? I’d appreciate it.

ReplyDeleteBest Selenium Training in Chennai | Selenium Training Institute in Chennai | Besant Technologies

Best AWS Training in Chennai | Amazon Web Services Training in Chennai

This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me..

ReplyDeleteangularjs Training in online

angularjs Training in bangalore

angularjs Training in bangalore

angularjs Training in btm

Read all the information that i've given in above article. It'll give u the whole idea about it.

ReplyDeleteonline Python certification course | python training in OMR | Python training course in Chennai

Great post! I am actually getting ready to across this information, It’s very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well.

ReplyDeleteOnline DevOps Certification Course - Gangboard

Best Devops Training institute in Chennai

After reading this web site I am very satisfied simply because this site is providing comprehensive knowledge for you to audience. Thank you to the perform as well as discuss anything incredibly important in my opinion. We loose time waiting for your next article writing in addition to I beg one to get back to pay a visit to our website in

ReplyDeleteJava training in Chennai | Java training in Bangalore

Java interview questions and answers | Core Java interview questions and answers

It's interesting that many of the bloggers to helped clarify a few things for me as well as giving.Most of ideas can be nice content.The people to give them a good shake to get your point and across the command

ReplyDeleteData Science course in rajaji nagar | Data Science with Python course in chenni

Data Science course in electronic city | Data Science course in USA

Data science course in pune | Data science course in kalyan nagar

Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.

ReplyDeleterpa interview questions and answers

automation anywhere interview questions and answers

blueprism interview questions and answers

uipath interview questions and answers

rpa training in chennai