The basic idea behind references is pretty simple. Instead of having a variable hold a value, we want the variable to hold a reference to some value. A reference is effectively an address to the memory location where the value has been stored. Using this reference, we can access or change the value the reference points to.
Let's illustrate this with a simple example. The & operator obtains a reference to a variable, and therefore to the value it holds. The * (de-referencing) operator can then be applied to the reference to get or change the value pointed to by the reference:
mut a = 3 imm ref = &a // ref holds a reference to a's contents imm val = *ref // val is now 3 *ref = 4 // a is now set to 4
Benefits and Safety Risks
References offer important benefits:
- Shared access. Multiple references can point to the same value, such that all of them see and can change the latest value at that address. If one reference is used to change a value, another reference to that value sees the updated value. By contrast, values passed around between functions do not work this way, as passed values are copies. Any change to a copy leaves the original value unchanged.
- Space efficiency. Some data values can be large, such that making copies of them as they are passed around takes up a lot of space and wastes time in the copying. In these situations, it can be more efficient to pass around a reference to the large value.
- Speed of Vectorization. If multiple values are co-located right next to each other in memory, it can be faster to use a pointer to process those values sequentially, rather than work with each value separately, one at a time.
However, without appropriate constraints, the unfettered ability to access and change any arbitrary value based on its address opens up a program to significant safety risks:
- Memory unsafe. A pointer might point to an object that has been deleted and no longer exists. It could also point to an area of memory (e.g., NULL) that has no valid objects.
- Concurrency unsafe. One pointer might try to access an object while its contents are in the process of being changed by another pointer. Since the contents are in transition, they may not make sense to an outside pointer. Worse would be when two pointers try to change the same object at the same time across multiple steps, making a mess of it in the attempt.
- Type unsafe. A pointer might point to an object whose type is different than it expected (e.g., the program thought it was pointing at an integer, but there is a float there instead).
Such failures are not always easy to detect and eliminate when the responsibility for being careful lies 100% with fallible humans. Although the consequences are not always serious, they could be catastrophic, especially when malicious actors take advantage of undetected pointer problems to gain access or control over other peoples' privileges or information.
Because of these safety risks, Cone explicitly distinguishes between references and pointers. Although both offer addressable access to values, references are constrained to prevent unsafe use. Pointers are not subject to such constraints, making them more versatile but also potentially more dangerous.
Reference Type Signature
The way that Cone handles references is unusual among programming languages. Although many capabilities are similar to those found in other languages, Cone's distinctive approach optimizes for versatility, safety and ease-of-use.
Getting comfortable with Cone's references begins with an introduction to the four fundamental mechanisms that underlie their capabilities and constraints. These mechanisms are formalized as part of the type signature for every reference.
- Allocators. A reference's allocator is responsible for allocating new values. The allocator also tracks all allocated values, automatically disposing of any values no longer needed. Allocators are what makes it possible for a Cone program to support multiple memory strategies, such as tracing GC, ref counting, escape-based RAII, arenas, pools, etc. In effect, allocators help ensure memory safety. A reference that has no allocator is a borrowed reference.
- Lifetimes. Cone automatically tracks the lifetime of every reference based on its lexical scope. Lifetimes are invaluable for ensuring that allocated objects are freed only when no reference points to it. They also ensure that borrowed references can never point to a freed value. In these ways, lifetimes also help improve memory safety.
- Permissions. Permissions govern what may be done with a reference. Each distinct permission grants and denies certain rights, such as the ability to read or change the value the reference points at, or the ability to alias (create a copy of) a reference. Permissions also govern the use of atomics, locks and other synchronization mechanisms. Permissions help ensure concurrency safety.
- Value type. The value type specifies the type of the value that the reference points to. References are often treated as simply a stand-in for a value of this type. The value type helps protect type safety.
The following pages provide details on each of these mechanisms. For now, it is helpful to know that the type signature for a reference always begins with & followed by qualifiers for each of the above mechanisms in the listed order. Here are examples of reference type signatures:
imm ref1 &rc imm i32 // ref-counted, immutable reference to an integer value imm ref2 &i32 // borrowed, 'const' reference to an integer value. Lifetime assumed. imm ref2 &'a mut i32 // borrowed, 'mut' reference with specified lifetime annotation
By default, references can only point to valid values. However, it is possible to explicitly declare and use nullable references. A nullable reference can have the value null, which means the reference does not point to any valid value.
The type signature for a nullable reference specifies a ? after the ampersand:
imm ref4 &?i32 // borrowed, 'const' nullable reference
To ensure safety, access to a nullable reference's value is only possible if the code first ensures the reference does not have the value of null:
// This condition is true only if maybePoint is not null ... if (!maybePoint) imm point = *maybePoint // ... allowing us to obtain its value imm point2 = *maybePoint // **ERROR** We don't know if maybePoint is null here
References are values. As such, they can be stored and passed around a program. Whether such transfers are simple copies or moves depends on the reference's type (particularly its permission and allocator). Reference transfers also check reference type information to ensure everyone is in agreement about what you can do with any passed-around reference.
In general, references are treated as stand-ins for the values they refer to. Operations on references will nearly always apply to the value the reference refers to, rather than the references themselves. However, there are a few operations which do operate directly on references: dereferencing, comparison, and reinterpretation.
As several examples have demonstrated, the * operator is used to access a reference's value. This is called de-referencing. De-referencing a reference is sometimes prohibited based on its permission or whether it might have null as a value.
For most operations on a reference, a reference is automatically de-referenced before performing the operation. For example:
// Assume ref1 is a reference to an integer imm sum = ref1 + 4 // equivalent to *ref1 + 4
Two references may be compared for equality, but only if they have the same type signature
// Do ref1 and ref2 point to the same value? if ref1 == ref2 // do some stuff
It is also possible to compare whether one pointer is greater than another. This is typically only meaningful if both refer to somewhere within the same object.
The 'as' operator may be used to take an existing reference and create a new reference with a different type signature. Whenever this might create some safety risks, use of the 'as' operator will need to happen within a 'trust' blocks.
imm newref = oldref as &Point // Coerce a reference to a borrowed reference
Array and Interface References
In addition to the basic references introduced on this page and explored over the next few pages, Cone supports more complex reference types that carry more than a pointer. These are covered elsewhere:
- Array references, which point to a collection of identically-typed elements. These elements are contiguous in memory with the reference pointing to the first element. The number of elements is carried as part of the array reference.
- Interface references, which provide a generic view of some value defined by one of several possible struct types. In addition to the object pointer, the reference also points to runtime type information about the value's type, such as a vtable for dynamically dispatching method calls on the object.