A source program's UTF-8 encoded characters are grouped together to form a stream of tokens. These tokens are the fundamental elements of Cone's grammar. Each token's type is determined by its initial character:

Additionally, Cone supports block and statement inference, auto-generating curly braces and semi-colon tokens based on line indentation and composition. Comments and whitespace characters are allowed but ignored, only serving to separate one token from the next.

Numeric Literal

A numeric digit (from '0' to '9') starts an integer or float literal. Although a negative sign ('-') preceding a numeric literal is not considered part of the token (as that dash might be a minus sign), it still has the desired effect of negating the number that follows. Using a negative sign on an unsigned integer literal does not make it a signed one.

Underscores may be used within a numeric literal to improve readability.

Integer Literal

An Integer literal may be:

By default, an integer literal is a signed 32-bit number. To change this, specify one of the following suffixes:

Float Literal

A float literal also starts with a digit from '0' to '9'. To distinguish it from an integer literal, it must contain a decimal point, exponent ('E' or 'e'), or a type suffix that explicitly declares it to be a float ('f', 'd', 'f32' or 'f64'). A period is considered to be a decimal point if it is unambiguously not being used as part of a range operator; which is to say that the period is not immediately followed by another period.

The Float token may specify an exponent, which is indicated by an 'e' or 'E' followed by an optional minus sign and additional numeric digits.

By default, an float literal is a 32-bit number. To change this, specify one of the following suffixes:

Character Literal

A character literal begins and ends with a single quote (') within which must be found a single UTF-8 unicode character. Its type is u32: an unsigned, 32-bit integer.

Any character whose Unicode value is 0x0020 or higher can be specified explicitly. Alternatively, one of the following escape sequences that begin with '\' may be used:

\a
Alarm (U+0007)
\b
Backspace (U+0008)
\f
Form feed (U+000C)
\n
New-line (U+000A)
\r
Return (U+000D)
\t
Tab (U+0009)
\v
Vertical tab (U+000B)
\\
\
\'
'
\"
"
\0
Null character (U+0000)
\xnn
hexadecimal code value for a byte.
\unnnn
unicode character which matches the specified hexadecimal code point.
\Unnnnnnnn
unicode character which matches the specified hexadecimal code point.

String Literal

With string literals, multiple techniques are offered for specifying text content: All allow specification of multiple unicode characters, but they vary in the handling of a few special characters as needed for different circumstances.

A null character (U+0000) is always appended to the end of a string literal, for C compability. All string literals are treated as immutable.

Escaped vs. Raw Text

Many times it is convenient to use escape sequences in order to visibly include control and unicode characters. Other times, such as with regular expressions or XML text, it is less error-prone and more readable to be able to specify backslashes or double quotes without having to escape them with backslashes.

The first characters of the string literal establishes how escape sequences are handled:

If a string literal begins with a single double-quote (or backtick), it ends with the next double quote (or backtick). If the string literal begins with a triple double-quote, it ends with a triple double-quote. If there are more than three double-quotes at the end, the terminator is the last three:

""""Happy Birthday!""""   // yields the string literal: "Happy Birthday!"

Multi-line String Literals

Some string literals are long enough to require multiple lines of code to specify. It would be convenient to be able to format such content properly using indentation and readable margins, without that formatting necessarily carrying over into the text of the literal.

These are the rules that make that possible:

For example:

// Equivalent to "a\nb"
"
   a
   b\
   "

URL Literal

As described later, the '@' operator returns the value of the resource whose URL follows as a text value. The resource is loaded at compile-time if the text is a literal value enclosed in double quotes. Otherwise, the resource whose URL is specified by an parentheses-wrapped text expression will be loaded at run-time.

As a convenience, it is possible to specify the URL's literal value without surrounding it with double quotes. This happens when the character that follows the '@' is not a double quote ("), open parenthesis ('('), space, tab, lf, cr, or eof. The end of the URL happens when encountering the first white space: space, tab, lf, cr or eof. This non-quoted URL is a URL literal.

Identifier

Other than the reserved keywords, a program may define and use any identifier as a variable, member, function, method, type, etc.

Typically, an identifier begins with a letter, '$' or '_'. A letter may be 'a'-'z', 'A'-'Z', or any unicode-defined universal letter as defined by C99 in ISO/IEC 9899:1999(E) Appendix D. Identifiers are case-sensitive; 'abc' is different from 'ABC'.

Subsequent characters may be a letters, digits, '$', or '_'.

To be able to include other characters, such as punctuation, as part of an identifier enclose the entire identifier in back-ticks.

The following are all valid identifiers:

balance toReturn True _temp_ $ π `*`

Note: '_' by itself (not followed by a letter or a number) is not an identifier, but a special punctuation token.

Keywords

These keywords are reserved and may not be used as identifiers:

and
logical 'and' operator used in boolean expressions
async
asynchronous execution
baseurl
url of the program's source code
break
terminate a loop block
context
The value of the currently executing execution state
continue
Re-iterate a 'while' or 'for' block
do
A block that performs an automatic '.begin' and '.end'
each
The block for iterating over a collection of values
else
A clause within an 'if' statement
elif
A clause within an 'if' statement
false
The value of 'false'
if
A conditional block or clause
in
A clause within a 'each' block
into
A clause within a 'match/with' block
local
ensures variable(s) are treated as locally scoped
match
Matches a calculated value to several possible values
new
Creates a new instance
null
The value of 'null'
not
logical 'not' operator used in boolean expressions
or
logical 'or' operator used in boolean expressions
return
terminate execution of a method with a return value
self
references the method's self parameter value
selfmethod
the currently executing method or closure
this
The value of the most inclusive 'this' block
true
The value of 'true'
using
clause on this block
wait
A block that waits until all its execution contexts are done.
while
A repetitive block or clause
with
A clause in a 'match' block
yield
suspend a generator with a return value

Operator and Precedence

An operator is a sequence of one or more punctuation characters with no intervening white space. The compiler is greedy, and will look for the longest character sequence that matches one of these operators.

The operators are sequenced from highest to lowest evaluation priority. Operators grouped together have the same priority. In parenthesis is shown whether an operator appears in front of a value (prefix) or between values (infix). Some operators can be used in both contexts. It also specifies the symbolic name of any method associated with that operator.

Value operators

( )
(p) prioritizes expression to be evaluated as a group
[ ]
(p) array
{ }
(p) code block

Term operators

+ new
(p 'new') new instance
.
(i) method call/property access
.:
(i) property access
::
(i '[]') indexed access
( )
(i) method call parameters
[ ]
(i '[]') indexed access

Prefix operators

-
(p '@-') negate
@
(p) Internet resource load
<<
(p '<<') append to 'this'
>>
(p '>>') prepend to 'this'

Arithmetic/Collection operators

**
(i '**') exponent
*
(i '*') multiply
/
(i '/') divide/split
%
(i '%') remainder
+
(i '+') add/concatenate
-
(i '-') subtract

Range operator

..
(i) creates a range

Evaluation operators

==
(i '<=>') equal
!=
(i '<=>') not equal
===
(i) equivalent
~~
(i '~~') match
<=>
(i '<=>') compare (rocketship)
<
(i '<=>') less than
<=
(i '<=>') less than or equal
>
(i '<=>') greater than
>=
(i '<=>') greater than or equal

Evaluation operators

! not
(p) logical not
&& and
(i) logical and
|| or
(i) logical or

Ternary operator

? :
(i) "if .. then .. else ..." expression

Append/Prepend operators

<<
(i '<<') append
>>
(i '>>') prepend

Assignment operators

Note: Within variable declarations, the evaluation priority of ',' and '=' are reversed.

,
(i) value separation
=
(i) assign
:
(i) 'this' property assign
:=
(i) '[]' 'this' index set
+=
(i '+') add in place
-=
(i '-') subtract in place
*=
(i '*') multiply in place
/=
(i '/') divide in place

Statement terminator

The semicolon ; is the lowest priority "operator".

Block and Statement Inference

Most languages, such as C, require explicit denotation of blocks (enclosed within curly braces) and statements (separated by semi-colons). A few languages, such as Python and Nim, use the off-side rule to infer blocks based on changes to line indentation. Likewise, each new line in a block is considered to start a new statement. The benefit of inferred blocks and statements is code readability: it is more compact vertically and less cluttered with punctuation.

Cone supports both styles. The default mode is to infer blocks and statements. However, inference is turned off for all code enclosed within curly braces. So, when you explicitly use curly braces, the compiler assumes you will continue to be explicit about denoting blocks with curly braces and statement separators with semicolons.

In most cases, the inference is simple and obvious:

So:

a=1
while a<4
	wander(a)
	if outside?
		a = 2
	a = 3

Is the same as:

a=1; while a<4 {wander(a); if outside? {a = 2;}; a = 3;};

As the latter example shows, single-line brevity is allowed. Use the semicolon to pack multiple statements onto a line. Likewise, wrap curly braces around a block placed on the same line as the "preceding" line.

Statement and block inference are governed by specific lexical rules.

Lines and Statement Inference

Lines are separated by the line-feed (U+000A) character. The carrier return (U+000D) character is ignored and plays no part as a line separator. Line numbering begins with 1.

Typically, each new line is assumed to be a new statement. To signal to the parser that the previous statement is finished, lexical analysis auto-injects a semi-colon token prior to processing tokens in the new line.

A new line is not considered to be a new statement when it has no code content (it is blank or just comment documentation) or it is a continuation of the previous line's statement. Continuation may be explicitly specified by ending the statement's line with a backslash or starting the next line (after the correct indentation) with a backslash.

For example:

a = 
\ b or
\ c       // equivalent to: a = b or c

Line Indentation and Blocks

A line's indentation is the number of either spaces or tabs at the beginning of a line. A program should consistently use only spaces or only tabs for indentation. Indentation matters only when a line starts a new statement; it is ignored for lines that are blank, contain only a comment, or which continue an existing statement.

Increasing line indentation automatically injects '{' to signal the start of a new block. Decreasing line indentation to a prior indentation level (or end-of-file) injects '}' to signal the end of a block.

Comments

Comments make the code's logic easier to understand and maintain. They are for people. They have no impact on program execution.

Comments may be placed between any two tokens (and certainly not within a text literal).

There are two types of comments:

White Space

Spaces are regularly used to improve code clarity and separate tokens. Except within text or symbols, spaces are otherwise ignored. Other control or unexpected characters, such as tabs not at the start of the line or the carrier return (U+000D), are also ignored.

If the program source begins with the UTF-8 byte-order mark (U+FEFF), it is ignored.

End-of-File

The program code ends when it reaches a null character (U+0000), end-of-file character (U+001A) or has no more characters. The lexer will tidy up whatever is unfinished (e.g., still open blocks, literals, or comments) and then will generate the end-of-file token for the parser.