It is now time to introduce a different kind of type: references. The basic idea is simple: references carry an address that points to some value in memory. Using this reference, we can access or change the value the reference points to.
mut a = 3 imm ref = &mut a // ref holds a reference to a's contents imm val = *ref // val is now 3 *ref = 4 // a is now set to 4
The & operator obtains a reference to the variable a, effectively pointing to the value it holds. The * (de-referencing) operator is used to obtain and then change the value pointed to by the reference.
References are very useful, as they support:
- Shared access to the same value. Multiple references may point to the same value. Any of them can see (and possibly change) the latest value at that address. If one reference changes a value, another sees the updated value. By contrast, values copied around between functions do not work this way: changing a copy leaves the original value unchanged.
- Dynamic heap allocation. We can dynamically allocate new data values on the heap whose lifetime is independent of the local execution stack. References are the handles we use to access these data values. When all references to a dynamically-allocated value expire, so does the value.
- Performance efficiency. For non-trivial data structures, it is faster to pass around a reference to the data rather than making a full copy for each function that needs it.
Memory Management and Safety
Cone's distinctive references are more than just "pointers". They make fine-grained control over memory convenient and safe.
Let's review briefly how the versatility of region-based memory management, powered by references, can facilitate improvements to throughput, responsiveness, and memory efficiency. Likewise, references play an important role in ensuring memory and race safety.
Region-based Memory Management
In addition to global and local (stack-based) data, it is often desirable to dynamically allocate memory from the "heap" for new data objects. When memory is allocated and initialized, an owning reference is created that points to the new data object. This reference, or copies of it, may easily be used to work with this data. When the last usable reference to an allocated object expires, the object's memory is automatically recovered. This is called automatic memory management.
Many different memory management strategies exist, such as tracing garbage collection, reference counting, single-owner, arenas, pool etc. No single strategy is perfect for all needs. Their trade-offs vary widely in terms of throughput, latency, memory use and leakage, data structure flexibility, ease-of-use, etc.
Instead of restricting a program to the limitations of only one memory management strategy, as most languages do, Cone dynamically partitions memory into regions, each with its own approach to memory allocation and collection. For every new data object, Cone allows the programmer to specify, at allocation time, which region the object belongs to, according to how well it matches the object's usage profile.
Memory and Race Safety
References are also different from raw pointers in terms of safety. Although versatile, raw pointers can be very risky to use. Incorrect use can potentially cause hard-to-detect program bugs. Although the consequences are not always serious, they could be catastrophic, especially when malicious actors take advantage of undetected pointer problems to gain access or control over other peoples' privileges or information.
By contrast, usage constraints inherent to references ensure they can only be used safely:
- Region, lifetime, initialization, bounds, and subtyping constraints ensure that every reference always points to valid, live data of the expected type. This is called memory safety.
- Permission constraints ensure that use of one reference cannot corrupt the integrity of data also accessible by another reference, even when those references are used by concurrent threads. This is called race safety.
To accomplish these memory management and safety responsibilities, references are composed of four largely-independent mechanisms. These are specified (often implicitly) as part of a reference's type signature:
- Region. Every reference points to data owned by a specific memory region. A region knows how to allocate memory for all the data it owns. It also knows how to automatically reclaim memory for any owned data when no accessible reference points to it. Regions allow a program to make use of multiple memory management strategies. An "owned" reference specifies its owning region as part of its type signature. A "borrowed" reference is unaware of which region owns the data it points to.
- Lifetime. A lifetime is some span of time in the execution of a program, often bounded by a lexical scope. In a very real way, the lifetime of an allocated data is determined by the aggregate lifetime of all the usable references that point to it. Often, references need not specify lifetimes explicitly, as they can be inferred by context.
- Permission. A reference's permission governs what may be done with it. Each distinct permission grants and denies certain rights, such as the ability to read or change the value the reference points at, or the ability to make a copy of a reference. Permissions also govern the use of atomics, locks and other synchronization mechanisms.
- Value type. The value type specifies the type of the value that the reference points to. References are often treated as simply a stand-in for a value of this type.
In addition to basic references, there are also special-purpose "fat" references which are handled somewhat differently:
- Array references point to a collection of identically-typed elements. These elements are contiguous in memory with the reference pointing to the first element. The number of elements is carried as part of the array reference. Such references may be indexed and are subject to bounds checks.
- Virtual references offer an abstract view of some value instantiated by one of several variant types. In addition to pointing at the value's data, the reference also points to runtime type information that facilitates field access or method dispatch.
This page introduced many reference concepts rather quickly, and perhaps overwhelmingly. Let's slow down and walk through each of them carefully with examples and helpful detail. Importantly, you do not have to know everything about references to start usefully employing them.