Median and MAE
Some time ago I encountered the following result: the optimal constant value that minimizes the mean absolute error is the median of a training set. I didn’t give much thought to that but lately I decided to find a proof of this statement. After some quick search I found this brilliant and concise explanation: Why does the median minimize 𝐸(|𝑋−𝑐|)? It took me a bit of time to remember some calculus I had in university to understand the solution. I decided to write this post with my explanations in order to make this proof even more accessible for others. So, let’s go!
First of all, a quick reminder what Mean Absolute Error is:
Also, we’ll need the definition of a median. Suppose we have a random variable X with a cumulative distribution function F(x) and a probability density function p(x). A median of this probability distribution is by definition any real number m that satisfies the inequalities:
Notice that the MAE above is defined for a sample of X values, so if we consider MAE for the whole set of X values, then MAE will turn into an expectation of |x-c|, where c is our estimate:
We can split the integral in two:
Let’s denote expectation of absolute error as J. In order to find the minimum of this function with respect to c, we will find its derivative and then set it to zero. We start with differentiation:
Notice that |x-c| = (x-c) for c ≤ x and |x-c| = (c-x) for c ≥ x. We can rewrite the expression above as:
Here we denoted the two integrals as I₁ and I₂. Using Leibniz integral rule and setting the derivative to zero we get:
So
The last equation is equivalent to
Given that
We obtain the desired result :
Q.E.D.!