Katahdin

From Bauman National Library
This page was last modified on 1 June 2016, at 17:11.
Katahdin
Paradigm Multi-paradigm
Designed by Chris Seaton
First appeared 2007
Typing discipline Static, Dynamic
License BSD
Website http://chrisseaton.com/katahdin/

Katahdin - the programming language in which the syntax and semantics can be changed at runtime. Katahdin uses PB grammar and packrat-parser. New designs, such as expressions or statements, or even a new language can be created from scratch. The only language implementation uses the platform .NET (implementation Mono).

Introduction

Katahdin allows you to enter new designs as easily as in conventional programming languages: you can create functions and data types. For example, in many programming languages there is modulo operation (mod). If your programming language does not contain such an operation, does not allow in any way to overload it, the implementation of such a transaction will result in the implementation of the study of language, change in the grammar and build a new translator. Katahdin allows us to describe the operation of taking the remainder in a few lines:

class ModExpression : Expression {
    pattern {
        option leftRecursive;
        a:Expression "%" b:Expression
    }

    method Get() {
        a = this.a.Get...();
        b = this.b.Get...();
        return a - (b * (a / b));
    }
}

This code is executed at run-time, and immediately after that the operation of taking the remainder will become part of the language. Thus, after the announcement of a new operation, it can be used later in the program. Katahdin does not contain specific designs that can not be changed (even spaces and keywords are not an exception). If you describe all the construction of a programming language in a similar way, the Katahdin become a shell of this language. Thus, Katahdin can be used as a universal interpreter. You can also use one language in another, just describing both languages ​​in Katahdin. To date, there are similar descriptions for Fortran and Python. Sometimes the programmer wants to use one language in another. For example, SQL is used widely in other languages, code written in SQL strings. Katahdin allows you to combine the main language with SQL, so it can be used as part of the basic language. Also, sometimes there is a desire to define a new operator at runtime. For example, in Java 1.5 for-each has been added to the operator. But while the realization of the language has not been released, programmers could not use this feature. Using Katahdin in this case would allow programmers to use the design, without waiting for the next version of Java. The implementation of this statement would look like this:

class ForStatement : Statement {
    pattern {
        "for" "(" init:Expression ";"
            cond:Expression ";" inc:Expression ")"
        body:Statement
    }

    method Run {
        init.Get...();
        while (cond.Get...()) {
            body.Run...();
            inc.Get...();
        }
    }
}

Grammar

Template Description

Grammar in Katahdin is changed PB grammar. Recording is very close as CFG. Just text is placed in quotation marks ("text"), and may include the standard Escape-sequence. A range of characters can be expressed by means of the operator ..: "a" .. "z". Operator . matches any character. Other structures are indicated by names (value). The sequence seems words separated by spaces: abc operators? + Translated as zero-or-one, zero-or-more than one-or-more, respectively. Expressions can be grouped into brackets to determine the priority calculation. Operators & and! check the text, but do not remove it. The & operator takes when passed, if the expression fits. Operator ! - On the contrary. An alternative is represented by the operator |. In contrast to the RV-grammars, that use the operator /, the highest appropriate is selected among alternatives, and the order of alternatives to the procedure does not affect their priority.

Универсальный конец строки:
    "\r" "\n"? | "\n"
Список параметров:
    "(" (Value, ("," Value)*)? ")"
Комментарий в С:
    "//" (!EndOfLine .)*

Priorities

Priorities determine the order of operations in which they will be made in the calculation expression. In mathematics, and most programming languages ​​operations * and / have higher precedence than + and -. Generally speaking, the priorities can be described simply using the grammar, but the addition of the new operation will cause a change in other rules of language , and if these rules were described in another module, the programmer will not be able to know exactly what the rules should be changed. Therefore Katahdin encourages to describe the operations in a natural way, defining priorities later, using the keyword precedence:

class AddExpression : Expression {
    pattern {
        option leftRecursive;
        Expression '+' Expression
    }
}

class SubExpression : Expression {
    pattern {
        option leftRecursive;
        Expression '-' Expression
    }
}

precedence SubExpression = AddExpression;

class MulExpression : Expression {
    pattern {
        option leftRecursive;
        Expression '*' Expression
    }
}

class DivExpression : Expression {
    pattern {
        option leftRecursive;
        Expression '*' Expression
    }
}

precedence DivExpression = MulExpression;
precedence MulExpression > AddExpression;

Advantages of the language

Common runtime

At the moment writing a system in several different languages is rather difficult. Each language requires its own environment, which is necessary to choose, it is possible to buy and install. Each environment should be periodically updated and supplemented. When changing the platform, all media must be ported. Katahdin provides a single environment for the execution of code in different languages. If you change the platform, only Katahdin environment must be ported.

Simple language interoperability

The interaction between languages, performing in different execution environments, naturally difficult. The exchange of information can be done by creating a read / write channel. Access code (for example, functions or types) to organize even more difficult, requiring the use of tools such as CORBA or COM. In the program on Katahdin code and data freely shared between the language used without requiring binding or IO. An example of using FORTRAN and Python in a single source file:

import "fortran.kat";
import "python.kat";

fortran {
    SUBROUTINE RANDOM(SEED, RANDX)

    INTEGER SEED
    REAL RANDX

    SEED = 2045*SEED + 1
    SEED = SEED - (SEED/1048576)*1048576
    RANDX = REAL(SEED + 1)/1048577.0
    RETURN

    END
}

python {
    seed = 128
    randx = 0

    for n in range(5):
        RANDOM(seed, randx)
        print randx
}

Katahdin is a programming language as a tool, not limited to the programmer in choosing a platform or libraries. Each section of the program is responsible for a specific task can be carried out at a suitable programming language (depending on the task, developers, and libraries).

For example, a Fortran program is designed for calculations may be difficult to deal with some word processing required at IO. This site is better to use a more appropriate language, for example, Perl. In the program on Katahdin code to handle text and written in Perl, it can be used in the same file as the Fortran program.

In addition to the use of modules that describe the standard languages, Katahdin allows you to extend and change the language. However, even if the programmer is not going to expand the languages ​​used, this feature can be used (for example, to implement the features that appear in the next version of the language, but who would like to use it now).

Design

Although Katahdin can be viewed as a generic runtime for any language, a base language is provided for implementing the standard library and writing other language definitions. The language should therefore be useable by programmers coming from as many different languages as possible and be powerful and flexible enough to express the semantics of many paradigms and languages. This language is:

  • Free-form. Languages such as Python and occam express scope using the off-side rule where indentation is increased with the scope depth and statements are terminated by a line break. Programmers either loveor-hate this style, so it is not here.
  • Curly-braced. Beginning with BCPL in the 1960s, curly braces {} have always been used to express scope by the leading languages of the day, including as C, C++, Java and C♯. Almost all programmers will have experience of curly-braces.
  • Imperative. PEGs can be conceived as an imperative grammar so it is natural that the language is also imperative. Most programmers will be familiar with the imperative paradigm.
  • Object-oriented. Object-orientation is a design well understood by programmers and implemented by many languages. As will be shown, the design of the Katahdin grammar could also be described as objectoriented.
  • Dynamically typed, also known as runtime or latent typing. As in most interpreted or scripting languages, Katahdin variables and functions are not typed. Objects carry their type with them, and type compatibility is resolved at the point of execution of an operator by duck-typing. Objects are automatically converted between types as needed. Katahdin has to support dynamically and statically typed languages, and dynamic typing seemed the most general of the two typing disciplines.

Programming language or generator

Katahdin is a programming language instead of a parser generator. The language is imperative, object-oriented, ducktyped and curly-braced. Programmers familiar with C, Java, C#, Python or almost any popular imperative language should have no trouble using Katahdin. New language constructs are defined as a special case of defining a new class. The semantics for the construct are implemented by defining methods in the class, using existing language constructs. This class is the complete syntactical and semantic definition for a new exponentiation operator:

class Pow : Expression {
pattern {
a:Expression ‘∧’ b:Expression
}
method Get() {
return System.Math.Pow(@a.Get(),
@b.Get());
}
}

The Get() method is defined for all expressions. To implement the semantics of the exponentiation operator we call Get() for both of the operands and then call the library method System.Math.Pow(). The methods each construct has depends on convention. Expressions have Get() and Set() methods. If the Set() method is left out the expression cannot be assigned to: the default Set() implementation in Expression will throw an exception. Other constructs have appropriate methods, for example statements all have a Run() method. This class shows the Run() method being overloaded to implement a for-loop-statement:

class ForLoop : Statement {
pattern {
for ( init:Expression ;
cond:Expression ; inc:Expression )
body:Statement
}
method Run() {
init.Get();
while (cond.Get()) {
body.Run...();
inc.Get();
}
}
}

The body.Run...(); notation performs a call in the calling method’s scope. This is so that references to variables in the for loop’s body are resolved in the scope of the method which contains the for loop, and not in the scope of the method ForLoop.Run().

import ‘‘python.kat’’;
class PythonStatement : Statement {
pattern {
‘‘python’’ ‘‘{’’
statement:Python.Statement ‘‘}’’
}
method Run() {
statement.Run();
}
}

Example

import ‘‘sql.kat’’;
class SqlExpression : Expression {
pattern {
option recursive = false;
database:Expression ?
statement:Sql.Statement
}
method Get() {
// Evaluate the operands
database = @database.Get...();
sql = @statement.Get...();
// Execute the command
command = database.CreateCommand();
command.CommandText = sql;
reader = command.ExecuteReader();
// Read each row into a list
rows = [];
while (reader.Read()) {
row = [];
for (n = 0; n < reader.FieldCount; n++)
row.Add(reader.GetValue(n));
rows.Add(row);
}
return rows;
}
}
children = contactDatabase? select name
from contacts where age < 18;

References

  1. Katahdin
  2. Growing A Syntax
  3. A language with extensible syntax

The article is written by Makarov D. V.