You could have invented the determinant

I'm going to talk about how a geometric intuition becomes the definition of the determinant.

As many of you already know, the determinant of a matrix is the (hyper)volume of the (hyper)parallelepiped whose edges are the (row or column) vectors of the matrix. This page on Wolfram gives you an interactive visualization of a parallelogram or a parallelepiped while showing the corresponding determinant.

There are abundant sources on the internet (many on math.stackexchange.com) that give more-or-less the same explanation as I just said above, but most of them don't explain why. That's what I want to discuss in this post.

First goal: Define area in 2D

I'll motivate this discussion with a simple question: how to define volume in a Euclidean space?

I'm kidding. That question is actually not that simple, and people have gone to great lengths to answer that. Let me simplify it a bit more, quite gratuitously, so that we can cover it in one blog post:

How do we define the "area" of a parallelogram in a 2-dimensional space?

(I'm being intentionally vague here, but stay with me.)

First, let's agree that a parallelogram is defined by two vectors, like this:

In this picture, $u$ and $v$ are vectors in the 2-dimensional space, which I'll call $\mathbb R^2$ from now on. We're looking for a way to define the shaded area inside the parallelogram, i.e., a function $A$ that takes two vectors and gives out a number representing the area: \begin{gather*} A(u, v) = \text{ area of the parallelogram with $u$ and $v$ as edges }. \end{gather*}

It will turn out that defining a signed area will be simpler than a non-negative area, so we'll allow $A$ to go negative, and we'll try to reason what that means. Now, let's think about some properties that $A$ should have.

Property 1: Multilinear

The following picture should illustrate quite clearly why we want \begin{gather} A(u, v) + A(u, w) = A(u, v + w)\label{eq:add} \end{gather}

Obviously we want both shaded regions to have the same area. That translates directly into \eqref{eq:add}.

Note that the picture only captures the case where $v$ and $w$ both point upwards relative to $u$, which is pointing to the right. However, if $w$ actually points downward, \eqref{eq:add} would mean that there's some cancellation, i.e., $A(u, v)$ and $A(u, w)$ would have different signs. This is where the notion of signed area helps us: we don't need to split \eqref{eq:add} into cases.

If you're skeptical, you could say that the notion of signed area is forced by our artificial desire to satisfy \eqref{eq:add}. That may be the case, but it also suggests that the signed area could be a more natural definition than the non-negative area. In fact, this is just like what negative numbers mean in counting: borrowing.

(This is a common theme in algebra. Sometimes, "natural" definitions don't seem very natural at first, but they are indeed more natural once we broaden our minds to accept what "natural" means.)

For algebraic completeness, I should also mention that \begin{gather} \lambda A(u, v) = A(u, \lambda v)\label{eq:scalar} \end{gather} for any scalar $\lambda \in \mathbb R$. It should be pretty obvious to see this from the same picture above: just make $w$ parallel to $v$.

The two equations \eqref{eq:add} and \eqref{eq:scalar} can be combined into one:

\begin{gather} A(u, v + \lambda w) = A(u, v) + \lambda A(u, w)\label{eq:linear} \end{gather}

where $\lambda \in \mathbb R$. This simply says that the function $v \mapsto A(u, v)$, with $u$ fixed, is linear.

If we swap the parameters in $A$, the same argument above still applies, and we can conclude that

\begin{gather} A(u + \lambda v, w) = A(u, w) + \lambda A(v, w).\label{eq:linear2} \end{gather} In words, the function $u \mapsto A(u, w)$, with $w$ fixed, is linear.

There's a word for linearity in each parameter of a multi-parameter function (when all the other parameters are fixed): multilinearity.

Property 2: Alternating

Another observation we can make is that sheering the parallelogram does not change its area.

We can express this as an equation for $A$: \begin{gather} A(u, v + \lambda u) = A(u, v) \label{eq:sheer} \end{gather} for any scalar $\lambda \in \mathbb R$. In words, adding any multiples of $u$ to $v$ doesn't change $A(u, v)$ because it contributes no area. The special case where $v = 0$ gives us \begin{gather} A(u, u) = 0. \label{eq:alternating} \end{gather} Somehow people like to call a function with this property "alternating".

Note that we can also get

\begin{gather*} A(u + \lambda v, v) = A(u, v) \end{gather*}

from the same argument applied to $A$ with parameters swapped, or from \eqref{eq:linear2} and \eqref{eq:alternating} like this:

\begin{align} A(u + \lambda v, v) = A(u, v) + \lambda A(v, v) = A(u, v). \label{eq:sheer2} \end{align}

One convenient formula that we can derive from \eqref{eq:sheer2}, \eqref{eq:sheer} and \eqref{eq:linear2} is the following:

\begin{align*} A(u, v) & = A(u - v, v) & & \text{; from \eqref{eq:sheer2}}\\ & = A(u - v, v + u - v) & & \text{; from \eqref{eq:sheer}}\\ & = A(u - v, u) \\ & = A(-v, u) & & \text{; from \eqref{eq:sheer2}}\\ & = -A(v, u) & & \text{; from \eqref{eq:linear2}}. \end{align*}

This means swapping the order of the parameters of $A$ will just flip the sign, which is consistent with our notion of "borrowing" when the signed area goes negative.

The property that $A(u, v) = -A(v, u)$ is called skew-symmetry, which is synonymous with being alternating in the 2-dimensional case.

Remark: Even though I derived \eqref{eq:alternating} from \eqref{eq:sheer}, the reverse direction is also possible. (In fact, it is simpler too.) We can start with a geometric intuition of \eqref{eq:alternating}: $A(u, u) = 0$ means if the two edges of a parallelogram are parallel, the area is 0. Then, we proceed to derive \eqref{eq:sheer} with one application of multilinearity.

\begin{align*} A(u, v + \lambda u) & = A(u, v) + \lambda \underbrace{A(u, u)}_0 = A(u, v). \end{align*}

We actually did this already, when we derived \eqref{eq:sheer2}.

Base case

Knowing that $A$ has to be multilinear and alternating is almost enough to determine $A$ completely. We only need to define some more base cases. (As you may notice, \eqref{eq:linear}, and \eqref{eq:linear2} are, in a sense, recursive definitions. \eqref{eq:alternating} gives us some base cases, but we don't have any base cases that will give us any non-zero values yet.)

Suppose $e_1$ and $e_2$ are the standard coordinate basis vectors of $\mathbb R^2$:

\begin{gather*} e_1 = \begin{pmatrix}1\\0\end{pmatrix}, \qquad e_2 = \begin{pmatrix}0\\1\end{pmatrix}. \end{gather*}

Pictorially,

One natural choice for the base case of $A$ is

\begin{gather} A(e_1, e_2) = 1\label{eq:unit} \end{gather}

which says that a 1-by-1 square has area 1. (This is not the only choice that we could pick, but it's quite natural.)

Consequences

After picking the basis vectors and the non-zero base case of $A$, we are now ready to compute $A(u, v)$ for any vectors $u$ and $v$. Let's write out $u$ and $v$ as

\begin{gather*} u = \begin{pmatrix}a\\b\end{pmatrix} = ae_1 + be_2, \qquad v = \begin{pmatrix}c\\d\end{pmatrix} = ce_1 + de_2. \end{gather*}

Then,:

\begin{align*} A(u, v) & = A(ae_1 + be_2, v) \\ & = aA(e_1, v) + bA(e_2, v) & & \text{; from \eqref{eq:linear2}} \\ & = aA(e_1, ce_1 + de_2) + bA(e_2, ce_1 + de_2) \\ & = acA(e_1, e_1) + adA(e_1, e_2) + bcA(e_2, e_1) + bdA(e_2, e_2) & & \text{; from \eqref{eq:linear}}\\ & = adA(e_1, e_2) + bcA(e_2, e_1) & & \text{; from \eqref{eq:alternating}} \\ & = ad - bc & & \text{; from \eqref{eq:unit}.} \end{align*}

Note that I have used $A(e_2, e_1) = -A(e_1, e_2) = 1$ in the last line.

It probably hasn't been such a secret that our $A$ is the determinant. (I didn't intend for it to be a secret anyway.) The derivation above can be written with the usual determinant notation like this:

\begin{align} \begin{vmatrix} a & c\\b & d \end{vmatrix} & = a\begin{vmatrix}1 & c\\0 & d\end{vmatrix} + b\begin{vmatrix}0 & c\\1 & d\end{vmatrix}\nonumber\\ & = ac \underbrace{ \begin{vmatrix} 1 & 1\\0 & 0 \end{vmatrix}}_0 + ad \underbrace{ \begin{vmatrix} 1 & 0\\0 & 1 \end{vmatrix}}_1 + bc \underbrace{ \begin{vmatrix} 0 & 1\\1 & 0 \end{vmatrix}}_{-1} + bd \underbrace{ \begin{vmatrix} 0 & 0\\1 & 1 \end{vmatrix}}_0 \nonumber\\ & = ad - bc. \label{eq:formulafor2} \end{align}

Conclusion

We started with an attempt to define the area of a parallelogram, and we observed that our geometric intuition of area leads to the two key properties: multilinearity and skew-symmetry. These two key properties plus the "base case" choice (that $A(e_1, e_2) = 1$) are all we need to define $A$ for any pairs of vectors. If we write vectors as coordinates, $A$ becomes the usual determinant.

Addendum: Consistency of properties

How do we ensure that \eqref{eq:linear}, \eqref{eq:linear2}, \eqref{eq:alternating} and \eqref{eq:unit} are consistent, i.e., they do not contradict with each other?

We can prove the consistency by starting from the formula

\begin{gather} A(ae_1 + be_2, ce_1 + de_2) = ad - bc,\label{eq:formula} \end{gather}

which is well-defined for all pairs of vectors, then deriving \eqref{eq:linear}, \eqref{eq:linear2}, \eqref{eq:alternating} and \eqref{eq:unit} from \eqref{eq:formula}. I'm not going to show the derivation here though because it's very straightforward and easily searchable online (and in textbooks too).

This reverse way of developing math may seem a bit silly, but it is not pointless. Quite a lot of math is done the same way: we use intuition to write down properties, use those properties to make up definitions, then use those definitions to prove those properties. This ensures that our definitions capture our intuition correctly.

Higher Dimensions

Everything discussed so far extends to higher dimensions easily, but let me flesh out the detail for you a little bit for $\mathbb R^3$. Let $V(x, y, z)$ be the signed volume of a parallelepiped with sides $x$, $y$ and $z$.