Redcode

From Bauman National Library
This page was last modified on 8 June 2016, at 18:58.

Redcode[1] is the programming language used in Core War. It is executed by a virtual machine known as a Memory Array Redcode Simulator, or MARS. The design of Redcode is loosely based on actual CISC assembly languages of the early 1980s, but contains several features not usually found in actual computer systems.

Both Redcode and the MARS environment are designed to provide a simple and abstract platform without the complexity of actual computers and processors. Although Redcode is meant to resemble an ordinary CISC assembly language, it differs in many ways from "real" assembly.

The Redcode instruction set

The number of instructions in Redcode has grown with each new standard, from the original number of about 5 to the current 18 or 19. And this doesn't even include the new modifiers and addressing modes that allow literally hundreds of combinations. Luckily, we don't need to learn all the combinations. It is enough to remember the instructions, and how the modifiers change them.

Here is a list of all the instructions used in Redcode:

  • DAT -- data (kills the process)
  • MOV -- move (copies data from one address to another)
  • ADD -- add (adds one number to another)
  • SUB -- subtract (subtracts one number from another)
  • MUL -- multiply (multiplies one number with another)
  • DIV -- divide (divides one number with another)
  • MOD -- modulus (divides one number with another and gives the remainder)
  • JMP -- jump (continues execution from another address)
  • JMZ -- jump if zero (tests a number and jumps to an address if it's 0)
  • JMN -- jump if not zero (tests a number and jumps if it isn't 0)
  • DJN -- decrement and jump if not zero (decrements a number by one, and jumps unless the result is 0)
  • SPL -- split (starts a second process at another address)
  • CMP -- compare (same as SEQ)
  • SEQ -- skip if equal (compares two instructions, and skips the next instruction if they are equal)
  • SNE -- skip if not equal (compares two instructions, and skips the next instruction if they aren't equal)
  • SLT -- skip if lower than (compares two values, and skips the next instruction if the first is lower than the second)
  • LDP -- load from p-space (loads a number from private storage space)
  • STP -- save to p-space (saves a number to private storage space)
  • NOP -- no operation (does nothing)

Redcode is quite a bit different from more ordinary assembly languages, which results from its abstract nature.

Tactic Programs

The Imp

The truth is, the most important parts of Redcode are the easiest ones. Most of the basic warrior types were invented before the new instructions and modes existed. The simplest, and probably the first, Core War program is the Imp, published by A. K. Dewdney in the original 1984 Scientific American article that first introduced Core War to the public.

        MOV 0, 1

Yes, that's it. Just one lousy MOV. But what does it do? MOV of course copies an instruction. You should recall that all addresses in Core War are relative to the current instruction, so the Imp in fact copies itself to the instruction just after itself.

        MOV 0, 1         ; this was just executed
        MOV 0, 1         ; this instruction will be executed next

Now, the Imp will execute the instruction it just wrote! Since it's exactly the same as the first one, it will once again copy itself one instruction forward, execute the copy, and continue to move forward while filling the core with MOVs. Since the core has no actual end, the Imp, after filling the whole core, reaches its starting position again and keeps on running happily in circles ad infinitum.

So the Imp actually creates it's own code as it executes it! In Core War, self-modification is a rule rather than an exception. You need to be effective to be successful, and that nearly always means changing your code on the fly. Luckily, the abstract environment makes this a lot easier to follow than in ordinary assembly.

It should be obvious that there are no caches in Core War. Well, actually the current instruction is cached so you can't modify it in the middle of executing it, but maybe we should leave all that for the later...

The Dwarf

The Imp has one little drawback as a warrior. It won't win too many games, since when it overwrites another warrior, it too starts to execute the MOV 0, 1 and becomes an imp itself, resulting in a tie. To kill a program, you'd have to copy a DAT over its code.

This is just what another classic warrior by Dewdney, the Dwarf, does. It "bombs" the core at regularly spaced locations with DATs, while making sure it won't hit itself.

       ADD #4, 3        ; execution begins here
       MOV 2, @2
       JMP -2
       DAT #0, #0

Actually, this isn't precisely what Dewdney wrote, but it works exactly the same way. The execution begins again at the first instruction. This time it's an ADD. The ADD instruction adds the source and the destination together, and puts the result in the destination. If you're familiar with other assembly languages, you may recognise the # sign as a way of marking immediate addressing. That is, the ADD adds the number 4 to the instruction at address 3, instead of adding the instruction 4 to the instruction 3. Since the 3rd instruction after the ADD is the DAT, the result will be:

       ADD #4, 3
       MOV 2, @2        ; next instruction
       JMP -2
       DAT #0, #4

If you add two instructions together, both the A- and the B-fields will be added together independently of each other. If you add a single number to an instruction, it will by default be added to the B-field. It's quite possible to use a # in the B-field of the ADD too. Then the A-field would be added to the B-field of the ADD itself.

The immediate addressing mode may seem simple and familiar, but the new modifiers in the ICWS '94 standard will give it an entirely new twist. But let's look at the Dwarf first.

The MOV once again presents us with yet another addressing mode: the @ or the indirect addressing mode. It means that the DAT will not be copied on itself as it seems (what good would that be?), but on the instruction its B-field points to, like this:

       ADD #4, 3
       MOV 2, @2  ;  --.
       JMP -2     ;    | +2
       DAT #0, #4 ; <--' --. The B-field of the MOV points here.
       ...                 |
       ...                 | +4
       ...                 |
       DAT #0, #4 ; <------' The B-field of the DAT points here.

As you can see, the DAT will be copied 4 instructions ahead of it. The next instruction, JMP, simply makes the process jump two instructions backwards, back to the ADD. Since the JMP ignores its B-field I've left it empty. The MARS will initialise it for me as a 0.

By the way, as you see the MARS will not start tracing further chains of indirect addresses. If the indirect operand points to an instruction with a B-field of, say, 4, the actual destination will be 4 instructions after it, regardless of the addressing mode.

Now the ADD and the MOV will be executed again. When the execution reaches the JMP again, the core looks like this:

       ADD #4, 3
       MOV 2, @2
       JMP -2           ; next instruction
       DAT #0, #8
       ...
       ...
       ...
       DAT #0, #4
       ...
       ...
       ...
       DAT #0, #8

The Dwarf will keep on dropping DATs every 4 instructions, until it has looped around the whole core and reached itself again:

       ...
       DAT #0, #-8
       ...
       ...
       ...
       DAT #0, #-4
       ADD #4, 3        ; next instruction
       MOV 2, @2
       JMP -2
       DAT #0, #-4
       ...
       ...
       ...
       DAT #0, #4
       ...

Now, the ADD will turn the DAT back into #0, #0, the MOV will perform an exercise in futility by copying the DAT right where it already is, and the whole process will start again from the beginning.

This naturally won't work unless the size of the core is divisible by 4, since otherwise the Dwarf would hit an instruction from 1 to 3 instructions behind the DAT, thus killing itself. Luckily, the most popular core size is currently 8000, followed by 8192, 55400, 800, all of them divisible by 4, so our Dwarf should be safe

As a side note, including the DAT #0, #0 in the warrior wouldn't really have been necessary; The instruction the core is initially filled with, which I've written as three dots (...) is actually DAT 0, 0. I'll continue to use the dots to describe empty core, since is it shorter and easier to read.

The addressing modes

In the first versions of Core War the only addressing modes were the immediate (#), the direct ($ or nothing at all) and the B-field indirect (@) addressing modes. Later, the predecrement addressing mode, or <, was added. It's the same as the indirect mode, except that the pointer will be decremented by one before the target address is calculated.

       DAT #0, #5
       MOV 0, <-1       ; next instruction

When this MOV is executed, the result will be:

       DAT #0, #4 ;  ---.
       MOV 0, <-1 ;     |
       ...        ;     | +4
       ...        ;     |
       MOV 0, <-1 ; <---'

The ICWS '94 standard draft added four more addressing modes, mostly to deal with A-field indirection, to give a total of 8 modes:

# -- immediate

$ -- direct (the $ may be omitted)

* -- A-field indirect

@ -- B-field indirect

{ -- A-field indirect with predecrement

< -- B-field indirect with predecrement

} -- A-field indirect with postincrement

> -- B-field indirect with postincrement

The postincrement modes are similar to the predecrement, but the pointer will be incremented by one after the instruction has been executed, as you might have guessed.

       DAT #5, #-10
       MOV -1, }-1      ; next instruction

will after execution look like this:

       DAT #6, #-10 ;  --.
       MOV -1, }-1  ;    |
       ...          ;    |
       ...          ;    | +5
       ...          ;    |
       DAT #5, #-10 ; <--'

One important thing to remember about the predecrement and postincrement modes is that the pointers will be in-/decremented even if they're not used for anything. So JMP -1, <100 would decrement the instruction 100 even if the value it points to isn't used for anything. Even DAT <50, <60 will decrement the addresses in addition to killing the process.

The process queue

If you looked at the instruction table a few chapters above closely, you may have wondered about an instruction named SPL. There's certainly nothing like that in any ordinary assembly language...

Quite early in the history of Core War, it was suggested that adding multitasking to the game would make it much more interesting. Since the rough time-slicing techniques used in ordinary systems wouldn't fit in the abstract Core War environment (most importantly, you'd need an OS to control them), a system was invented whereby each process is executed for one cycle in turn.

The instruction used to create new processes is the SPL. It takes an address as a parameter in its A-field, just like JMP. The difference between JMP and SPL is that, in addition to starting execution at the new address, SPL also continues execution at the next instruction.

The two -- or more -- processes thus created will share the processing time equally. Instead of a single process counter that would show the current instruction, the MARS has a process queue, a list of processes that are executed repeatedly in the order in which they were started. New processes created by SPL are added just after the current process, while those that execute a DAT will be removed from the queue. If all the processes die, the warrior will lose.

It's important to remember that each program has its own process queue. With more than one program in the core, they will be executed alternately, one cycle at a time regardless of the length of their process queue, so that the processing time will always be divided equally. If program A has 3 processes, and program B only 1, the order of execution will look like:

program A, process 1,

program B, process 1,

program A, process 2,

program B, process 1,

program A, process 3,

program B, process 1,

program A, process 1,

program B, process 1,

... And finally, a small example of the use of SPL. More information will be available in the later chapters.

       SPL 0            ; execution starts here
       MOV 0, 1

Since the SPL points to itself, after one cycle the processes will be like this:

       SPL 0            ; second process is here
       MOV 0, 1         ; first process is here

After both of the processes have executed, the core will now look like:

       SPL 0            ; third process is here
       MOV 0, 1         ; second process is here
       MOV 0, 1         ; first process is here

So this code evidently launches a series of imps, one after another. It will keep on doing this until the imps have circled the whole core and overwrite the SPL.

The size of the process queue for each program is limited. If the maximum number of processes has been reached, SPL continues execution only at the next instruction, effectively duplicating the behaviour of NOP. In most cases the process limit is quite high, often the same as the length of the core, but it can be lower (even 1, in which case splitting is effectively disabled).

Oh, and as for thruth often being stranger than fiction, I recently came across a web page titled "Opcodes that should've been". Amongst some really absurd ones I found "BBW -- Branch Both Ways". As all the opcodes were supposed to be fictitious, I can only conclude that the author wasn't familiar with Redcode..

The instruction modifiers

The most important new thing brought by the ICWS '94 standard wasn't the new instructions or the new addressing modes, but the modifiers. In the old '88 standard the addressing modes alone decide which parts of the instructions are affected by an operation. For example, MOV 1, 2 always moves a whole instruction, while MOV #1, 2 moves a single number. (and always to the B-field!)

Naturally, this could cause some difficulties. What if you wanted to move only the A- and B-fields of an instruction, but not the OpCode? (you'd need to use ADD) Or what if you wanted to move something from the B-field to the A-field? (possible, but very tricky) To clarify the situation, the instruction modifiers were invented.

The modifiers are suffixes that are added to the instruction to specify which parts of the source and the destination it will affect. For example, MOV.AB 4, 5 would move the A-field of the instruction 4 into the B-field of the instruction 5. There are 7 different modifiers available:

  • MOV.A -- moves the A-field of the source into the A-field of the destination
  • MOV.B -- moves the B-field of the source into the B-field of the destination
  • MOV.AB -- moves the A-field of the source into the B-field of the destination
  • MOV.BA -- moves the B-field of the source into the A-field of the destination
  • MOV.F -- moves both fields of the source into the same fields in the destination
  • MOV.X -- moves both fields of the source into the opposite fields in the destination
  • MOV.I -- moves the whole source instruction into the destination

Naturally the same modifiers can be used for all instructions, not just for MOV. Some instructions like JMP and SPL, however, don't care about the modifiers. (Why should they? They don't handle any actual data, they just jump around.)

Since not all the modifiers make sense for all the instructions, they will default to the closest one that does make sense. The most common case involves the .I modifier: To keep the language simple and abstract no numerical equivalents have been defined for the OpCodes, so using mathematical operations on them wouldn't make any sense at all. This means that for all instructions except MOV, SEQ and SNE (and CMP which is just an alias for SEQ) the .I modifier will mean the same as the .F.

Another thing to remember about the .I and the .F is that the addressing modes too are part of the OpCode, and are not copied by MOV.F

We can now rewrite the old programs to use modifiers as an example. The Imp would naturally be MOV.I 0, 1. The Dwarf would become:

       ADD.AB #4, 3
       MOV.I  2, @2
       JMP    -2
       DAT    #0, #0

Note that I've left out the modifiers for JMP and DAT since they don't use them for anything. The MARS turns them into (for example) JMP.B and DAT.F, but who cares?

How did I know which modifier to add to which instruction? (and, more importantly, how does the MARS add them if we leave them off?) Well, you can usually do it with a bit of common sense, but the '94 standard does defines a set of rules for that purpose.

DAT, NOP

Always .F, but it's ignored.

MOV, SEQ, SNE, CMP

If A-mode is immediate, .AB,

if B-mode is immediate and A-mode isn't, .B,

if neither mode is immediate, .I.

ADD, SUB, MUL, DIV, MOD

If A-mode is immediate, .AB,

if B-mode is immediate and A-mode isn't, .B,

if neither mode is immediate, .F.

SLT, LDP, STP

If A-mode is immediate, .AB,

if it isn't, (always!) .B.

JMP, JMZ, JMN, DJN, SPL

Always .B (but it's ignored for JMP and SPL).

References

Cite error: Invalid <references> tag; parameter "group" is allowed only.

Use <references />, or <references group="..." />
  1. Redcode - instruction for beginers