The basic idea behind references is pretty simple. Instead of having a variable hold a value, we want the variable to hold a reference to some value. A reference is effectively an address to the memory location where the value has been stored. Using this reference, we can access or change the value the reference points to.
Let's illustrate this with a simple example. The & operator obtains a reference to a variable, and therefore to the value it holds. The * operator can then be applied to the reference to get or change the value pointed to by the reference:
mut a = 3 imm ref = &a // ref holds a reference to a's contents imm val = *ref // val is now 3 *ref = 4 // a is now set to 4
Benefits and Safety Risks
References offer important benefits:
- Shared access. Multiple references can point to the same value, such that all of them see and can change the latest value at that address. If one reference is used to change a value, another reference to that value sees the updated value. By contrast, values passed around between functions do not work this way, as passed values are copies. Any change to a copy leaves the original value unchanged.
- Space efficiency. Some data values can be large, such that making copies of them as they are passed around takes up a lot of space and wastes time in the copying. In these situations, it can be more efficient to pass around a reference to the large value.
- Speed of Vectorization. If multiple values are co-located right next to each other in memory, it can be faster to use a pointer to process those values sequentially, rather than work with each value separately, one at a time.
However, without appropriate constraints, the unfettered ability to access and change any arbitrary value based on its address opens up a program to significant safety risks:
- Memory unsafe. A pointer might point to an object that has been deleted and no longer exists. It could also point to an area of memory (e.g., NULL) that has no valid objects.
- Concurrency unsafe. One pointer might try to access an object while its contents are in the process of being changed by another pointer. Since the contents are in transition, they may not make sense to an outside pointer. Worse would be when two pointers try to change the same object at the same time across multiple steps, making a mess of it in the attempt.
- Type unsafe. A pointer might point to an object whose type is different than it expected (e.g., the program thought it was pointing at an integer, but there is a float there instead).
Such failures are not always easy to detect and eliminate when the responsibility for being careful lies 100% with fallible humans. Although the consequences are not always serious, they could be catastrophic, especially when malicious actors take advantage of undetected pointer problems to gain access or control over other peoples' privileges or information.
Because of these safety risks, Cone explicitly distinguishes between references and pointers. Although both offer addressable access to values, references are constrained to prevent unsafe use. Pointers are not subject to such constraints, making them more versatile but also potentially more dangerous.
Reference Mechanisms and Flavors
The way that Cone handles references is unusual among programming languages. Although many capabilities are similar to those found in other languages, Cone's distinctive approach optimizes for versatility, safety and ease-of-use.
Getting comfortable with Cone's references begins with understanding the four fundamental mechanisms that underlie their capabilities and constraints.
- Allocators. A reference's allocator is responsible for allocating new values. The allocator also tracks all allocated values, automatically disposing of any values no longer needed. Allocators are what makes it possible for a Cone program to support multiple memory strategies, such as tracing GC, ref counting, escape-based RAII, arenas, pools, etc. In effect, allocators help ensure memory safety. A reference that has no allocator is a borrowed reference.
- Lifetimes. Cone automatically tracks the lifetime of every reference based on its lexical scope. Lifetimes are invaluable for ensuring that allocated objects are freed only when no reference points to it. They also ensure that borrowed references can never point to a freed value. In these ways, lifetimes also help improve memory safety.
- Permissions. Permissions govern what may be done with a reference. Each distinct permission grants and denies certain rights, such as the ability to read or change the value the reference points at, or the ability to alias (create a copy of) a reference. Permissions also govern the use of atomics, locks and other synchronization mechanisms. Permissions help ensure concurrency safety.
- Value type. The value type specifies the type of the value that the reference points to. References are often treated as simply a stand-in for a value of this type. The value type helps protect type safety.
In addition to basic references, Cone also supports more complex reference types that carry more than a pointer:
- Array references, which point to a collection of identically-typed elements. These elements are contiguous in memory with the reference pointing to the first element. The number of elements is carried as part of the array reference.
- Interface references, which provide a generic view of some value defined by one of several possible struct types. In addition to the object pointer, the reference also points to runtime type information about the value's type, such as a vtable for dynamically dispatching method calls on the object.
The default condition for references is that they can only point to valid values. However, it is possible to also declare and create nullable references. A nullable reference can have the value null, which means the reference does not point to any valid value. Protective measures ensure that one can never dereference a nullable reference whose value is null.