Nim

From Bauman National Library
This page was last modified on 1 June 2016, at 17:03.
Nim
Paradigm multi-paradigm: imperative, object-oriented
Developer Andreas Rumpf
First appeared 2008
Stable release 0.13.0 / 18.01 2016
Typing discipline static
License MIT
Website http://nim-lang.org/
Influenced by
Lisp, Python, C, Object Pascal, Ada, Modula-3

Nim (formerly named Nimrod) is an imperative, multi-paradigm, compiled programming language designed and developed by Andreas Rumpf. It is designed to be "efficient, expressive, and elegant", supporting metaprogramming, functional, message passing, procedural, and object-oriented programming styles by providing several features such as compile time code generation, algebraic data types, an elegant foreign function interface (FFI) with C and compiling to JavaScript.

Initially, the Nim compiler was written in Pascal. In 2008, a version of the compiler written in Nim was released. The compiler is open source and is being developed by a group of volunteers in addition to Andreas Rumpf. The compiler generates optimized C code and defers compiling to an external compiler (a large range of compilers including clang and GCC are supported) to leverage their optimization and portability abilities. The compiler can also generate C++ and Objective-C code to allow for easy interfacing with APIs written in those languages, this in turn allows Nim to be used to write iOS as well as Android applications.

Description

Nim is statically typed, with a simple syntax. It supports compile-time metaprogramming features such as syntactic macros and term rewriting macros. Term rewriting macros enable library implementations of common data structures such as bignums and matrixes to be implemented with an efficiency as if they would have been builtin language facilities.Iterators are supported and can be used as first class entities in the language as can functions, these features allow for functional programming to be used. Object-oriented programming is supported by inheritance and multiple dispatch. Functions can be generic and can also be overloaded, generics are further enhanced by the support for type classes. Operator overloading is also supported. Nim includes automatic garbage collection based on deferred reference counting with cycle detection

Definitions

A Nim program specifies a computation that acts on a memory consisting of components called locations. A variable is basically a name for a location. Each variable and location is of a certain type. The variable's type is called static type, the location's type is called dynamic type. If the static type is not the same as the dynamic type, it is a super-type or subtype of the dynamic type.

An identifier is a symbol declared as a name for a variable, type, procedure, etc. The region of the program over which a declaration applies is called the scope of the declaration. Scopes can be nested. The meaning of an identifier is determined by the smallest enclosing scope in which the identifier is declared unless overloading resolution rules suggest otherwise.

An expression specifies a computation that produces a value or location. Expressions that produce locations are called l-values. An l-value can denote either a location or the value the location contains, depending on the context. Expressions whose values can be determined statically are called constant expressions; they are never l-values.

A static error is an error that the implementation detects before program execution. Unless explicitly classified, an error is a static error.

A checked runtime error is an error that the implementation detects and reports at runtime. The method for reporting such errors is via raising exceptions or dying with a fatal error. However, the implementation provides a means to disable these runtime checks. See the section pragmas for details.

Whether a checked runtime error results in an exception or in a fatal error at runtime is implementation specific. Thus the following program is always invalid:

var a: array[0..1, char]
let i = 5
try:
  a[i] = 'N'
except IndexError:
  echo "invalid index"

An unchecked runtime error is an error that is not guaranteed to be detected, and can cause the subsequent behavior of the computation to be arbitrary. Unchecked runtime errors cannot occur if only safe language features are used.

Lexical Analysis

Encoding

All Nim source files are in the UTF-8 encoding (or its ASCII subset). Other encodings are not supported. Any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. All of these forms can be used equally, regardless of platform.

Indentation

Nim's standard grammar describes an indentation sensitive language. This means that all the control structures are recognized by indentation. Indentation consists only of spaces; tabulators are not allowed.

The indentation handling is implemented as follows: The lexer annotates the following token with the preceding number of spaces; indentation is not a separate token. This trick allows parsing of Nim with only 1 token of lookahead.

The parser uses a stack of indentation levels: the stack consists of integers counting the spaces. The indentation information is queried at strategic places in the parser but ignored otherwise: The pseudo terminal IND{>} denotes an indentation that consists of more spaces than the entry at the top of the stack; IND{=} an indentation that has the same number of spaces. DED is another pseudo terminal that describes the action of popping a value from the stack, IND{>} then implies to push onto the stack.

With this notation we can now easily define the core of the grammar: A block of statements (simplified example):

ifStmt = 'if' expr ':' stmt
         (IND{=} 'elif' expr ':' stmt)*
         (IND{=} 'else' ':' stmt)?

simpleStmt = ifStmt / ...

stmt = IND{>} stmt ^+ IND{=} DED  # list of statements
     / simpleStmt                 # or a simple statement

Comments

Comments start anywhere outside a string or character literal with the hash character #. Comments consist of a concatenation of comment pieces. A comment piece starts with # and runs until the end of the line. The end of line characters belong to the piece. If the next line only consists of a comment piece with no other tokens between it and the preceding one, it does not start a new comment:

i = 0     # This is a single comment over multiple lines.
  # The scanner merges these two pieces.
  # The comment continues here.

Documentation comments are comments that start with two ##. Documentation comments are tokens; they are only allowed at certain places in the input file as they belong to the syntax tree!

Identifiers & Keywords

Identifiers in Nim can be any string of letters, digits and underscores, beginning with a letter. Two immediate following underscores __ are not allowed:

letter ::= 'A'..'Z' | 'a'..'z' | '\x80'..'\xff'
digit ::= '0'..'9'
IDENTIFIER ::= letter ( ['_'] (letter | digit) )*

Currently any Unicode character with an ordinal value > 127 (non ASCII) is classified as a letter and may thus be part of an identifier but later versions of the language may assign some Unicode characters to belong to the operator characters instead.

The following keywords are reserved and cannot be used as identifiers:

addr and as asm atomic
bind block break
case cast concept const continue converter
defer discard distinct div do
elif else end enum except export
finally for from func
generic
if import in include interface is isnot iterator
let
macro method mixin mod
nil not notin
object of or out
proc ptr
raise ref return
shl shr static
template try tuple type
using
var
when while with without
xor
yield

Some keywords are unused; they are reserved for future developments of the language.

Identifier equality

Two identifiers are considered equal if the following algorithm returns true:

proc sameIdentifier(a, b: string): bool =
  a[0] == b[0] and
    a.replace(re"_|–", "").toLower == b.replace(re"_|–", "").toLower

That means only the first letters are compared in a case sensitive manner. Other letters are compared case insensitively and underscores and en-dash (Unicode point U+2013) are ignored.

This rather unorthodox way to do identifier comparisons is called partial case insensitivity and has some advantages over the conventional case sensitivity:

It allows programmers to mostly use their own preferred spelling style, be it humpStyle, snake_style or dash–style and libraries written by different programmers cannot use incompatible conventions. A Nim-aware editor or IDE can show the identifiers as preferred. Another advantage is that it frees the programmer from remembering the exact spelling of an identifier. The exception with respect to the first letter allows common code like var foo: Foo to be parsed unambiguously.

Historically, Nim was a fully style-insensitive language. This meant that it was not case-sensitive and underscores were ignored and there was no even a distinction between foo and Foo.

Numerical constants

Numerical constants are of a single type and have the form:

hexdigit = digit | 'A'..'F' | 'a'..'f'
octdigit = '0'..'7'
bindigit = '0'..'1'
HEX_LIT = '0' ('x' | 'X' ) hexdigit ( ['_'] hexdigit )*
DEC_LIT = digit ( ['_'] digit )*
OCT_LIT = '0' ('o' | 'c' | 'C') octdigit ( ['_'] octdigit )*
BIN_LIT = '0' ('b' | 'B' ) bindigit ( ['_'] bindigit )*

INT_LIT = HEX_LIT
        | DEC_LIT
        | OCT_LIT
        | BIN_LIT

INT8_LIT = INT_LIT ['\''] ('i' | 'I') '8'
INT16_LIT = INT_LIT ['\''] ('i' | 'I') '16'
INT32_LIT = INT_LIT ['\''] ('i' | 'I') '32'
INT64_LIT = INT_LIT ['\''] ('i' | 'I') '64'

UINT_LIT = INT_LIT ['\''] ('u' | 'U')
UINT8_LIT = INT_LIT ['\''] ('u' | 'U') '8'
UINT16_LIT = INT_LIT ['\''] ('u' | 'U') '16'
UINT32_LIT = INT_LIT ['\''] ('u' | 'U') '32'
UINT64_LIT = INT_LIT ['\''] ('u' | 'U') '64'

exponent = ('e' | 'E' ) ['+' | '-'] digit ( ['_'] digit )*
FLOAT_LIT = digit (['_'] digit)* (('.' (['_'] digit)* [exponent]) |exponent)
FLOAT32_SUFFIX = ('f' | 'F') ['32']
FLOAT32_LIT = HEX_LIT '\'' FLOAT32_SUFFIX
            | (FLOAT_LIT | DEC_LIT | OCT_LIT | BIN_LIT) ['\''] FLOAT32_SUFFIX
FLOAT64_SUFFIX = ( ('f' | 'F') '64' ) | 'd' | 'D'
FLOAT64_LIT = HEX_LIT '\'' FLOAT64_SUFFIX
            | (FLOAT_LIT | DEC_LIT | OCT_LIT | BIN_LIT) ['\''] FLOAT64_SUFFIX

As can be seen in the productions, numerical constants can contain underscores for readability. Integer and floating point literals may be given in decimal (no prefix), binary (prefix 0b), octal (prefix 0o or 0c) and hexadecimal (prefix 0x) notation.

Types

All expressions have a type which is known at compile time. Nim is statically typed. One can declare new types, which is in essence defining an identifier that can be used to denote this custom type. These are the major type classes:

  • ordinal types (consist of integer, bool, character, enumeration (and subranges thereof) types)
  • floating point types
  • string type
  • structured types
  • reference (pointer) type
  • procedural type
  • generic type

Examples

The following code examples are valid as of Nim 0.10.2. Syntax and semantics may change in subsequent versions.

Hello world

echo "Hello World!"

Reversing a string

proc reverse(s: string): string =
  result = "" # implicit result variable
  for i in countdown(high(s), 0):
    result.add s[i]

var str1 = "Reverse This!"
echo "Reversed: ", reverse(str1)

This example shows many of Nim's features, one of the most exotic ones is the implicit result variable: every procedure in Nim with a non-void return type has an implicit result variable that represents the value that will be returned. In the for loop we see an invocation of countdown which is an iterator, if an iterator is omitted then the compiler will attempt to use an items iterator if one is defined for the type that was specified in the for loop.

Metaprogramming

template genType(name, fieldname: expr, fieldtype: typedesc) =
  type
    name = object
      fieldname: fieldtype

genType(Test, foo, int)

var x = Test(foo: 4566)
echo(x.foo) # 4566

This is an example of metaprogramming in Nim using its template facilities. The genType is invoked at compile-time and a Test type is created.

Wrapping C functions

proc printf(formatstr: cstring)
  {.header: "<stdio.h>", varargs.}

printf("%s %d\n", "foo", 5)

Existing C code can directly be used in Nim. In this code the well known printf function is imported into Nim and subsequently used.

Libraries

A Nim program can use any library which can be used in a C program. Language bindings exist for many libraries, for example GTK+2, SDL2, Cairo, OpenGL, WinAPI, zlib, libzip, OpenSSL and cURL. Nim works with PostgreSQL, MySQL and SQLite databases. Nim can interface with the Lua and Python interpreter. The tool c2nim helps to generate new bindings from C code.

Community

The language has a bug tracker with wiki hosted by GitHub and a forum. A presentation at O'Reilly Open Source Convention (OSCON) in 2015 is scheduled. O'Reilly Community: Essential Languages: Nim, Scala, Python.

References

  1. Official Nim documentation
  2. Nim on Githab