Demystifying lvalues

The & operator in C can be confusing. If we do p[1][2] then two memory loads happen: one for p[1] and one for p[1][2]. If we do &(p[1][2]) then only one load happens. This is weird, because that means that the semantics of & cannot be compositional: the meaning of &(p[1][2]) is not the meaning of & applied to p[1][2].

Or does it? Below I will give a compositional reading of & and * that I learned from Robbert Krebbers, and I think is slightly better than the usual explanation or Oleg’s explanation.

We pretend that we have two pointer types, ptr<t> and lvalue<t>. The type ptr<t> is written written t* in C, but I write ptr<t> for consistency. The type lvalue<t> is a new type that we introduce. An lvalue<t> is represented as an integer, exactly like a ptr<t>. The purpose of lvalues is to control automatic dereference: an lvalue<t> is can be automatically dereferenced, whereas a ptr<t> cannot be.

*p takes a ptr<t> and turns it into an lvalue<t>, not dereferencing it, but preparing it for automatic dereference
&x takes an lvalue<t> and turns it into a ptr<t>, thus supressing the automatic dereference
assignment x = y operates on an lvalue x
a plain variable mention x gives an lvalue, namely the memory location of the variable on the stack
accessing a field or an array element on an lvalue gives an lvalue (at runtime this is an integer add)
load takes an lvalue<t> and turns it into a t, doing the actual memory access. load is implicitly inserted by the compiler and never written in the source code.

Note that *p and &x don’t actually do anything, they just change the type of the expression. Only load and assignment x = y actually do something.

Example 1:

p[1] + 2

Firstly, p[1] is equivalent to *(p+1), so this is really:

*(p+1) + 2

The *(p+1) gives an lvalue, but for the addition + 2 we need the actual value, so we need to insert a load:

load(*(p+1)) + 2

Example 2:

&(p[1])

This is equivalent to &*(p+1). The & and * cancel out, so this is just:

p+1

Example 3:

&(p[1][2])

This is equivalent to &(*(*(p+1) + 2)). As before, for the addition *(p+1) + 2 we need the actual value, so we need to insert a load:

&(*(load(*(p+1)) + 2))

The outer & and * cancel out, so this is just:

load(*(p+1)) + 2

Example 4:

&x

We take the address of a variable. This works because if we mention a variable, we get an lvalue, which is the memory location of the variable. Subsequently, the &x gives a ptr<int>, thus supressing the automatic dereference.

In fact, in the above examples we should have inserted even more loads: in an expression like p+1, where p is a variable, we should have inserted a load to get the actual value of p:

load(p) + 1

This makes sense because variables are stored on the stack, we we do need an actual memory load. Storing variables in registers is an optimization that the compiler does for us, but only if it can prove that no pointer to the variable will be used.

Summary

We can view * and & as no-ops that change the type of a pointer to an lvalue and vice versa. Memory loads are implicitly inserted whenever we need the actual value of an lvalue.

The reading of expressions like p[1][2] and a.foo[3] is that they always compute an address. That is, the type of these should not be viewed as int, but as lvalue<int>, which is a pointer type. When doing 1 + p[1][2] or 1 + a.foo[3] we need to insert a load to get the actual int out of the lvalue<int>. When doing &p[1][2] or &a.foo[3] the & simply supresses the insertion of a load. This gives a compositional reading of such expressions.