Real Geeks Assemble: Beginner Assembly Part 2

Beginner Assembly Language for PPC Darwin

(oh, and Mac OS X too ; – )

copyright 2003 by Terrance Gene Davis

Hello there!

Those of you who have written any programs in other languages, have run across the famous “Hello world!” example. Usually it is the first example that is done in an unfamiliar language. At the day job, whenever someone gets a rudimentary example of some new software technology working they mention that they have a “hello world” done.

So why did I not start with a “hello world” in the last chapter? It would have been too complex for the first chapter. But now that you’ve seen some basics, here’s your example.

Save the first source file as “main.c”.


    #include <stdio.h>

    int main(void) {
        printf("%s", assembleFunction() );

        return 0;
    }

Save the second source file as "assem.s".
    .cstring
    MY_STRING:
            ; This a c string.
            .ascii "hello world!n"
    .text
    .globl _assembleFunction

    _assembleFunction:

            ; MY_STRING is a pointer to the c string.
            ; We copy the address to r3 in two steps.
            addis	r3, 0, hi16(MY_STRING)
            ori		r3, r3, lo16(MY_STRING)

            blr

Now compile it with the command “cc main.c assem.s -o run_me”, and run it with the command “./run_me”.

Congradulations! You’ve written you’re first hello world in assembly.


“What did you do?!”

So what is it that you just did? You created a c style string in assembly and returned a pointer to the string when the function assembleFunction() was called. Then as you can guess, it was printed.

If you’ll look at the assembly source code carefully you will notice I’ve put in some explanations as to what is happening. In PowerPC assembly compiled by cc/gcc, “;” is the prefered means of indicating a comment is to follow. This is the equivalent to “//” in java and c++. It means that anything from that point until the end of the line is not code.

Now you’ve also been introduced to “.cstring” and “.text”. These are what are called sections. When the program is in its final form, that is the run_me file in our case, the various pieces that are in the same types of sections are assembled together. .cstring indicates that the following section constains strings of the type found in the language c. (Note: Strings in c are a set of characters followed immediately by a .) .text announces that we are defining the functions that actually do something.

We also use our first label in this example. Our label is MY_STRING. It doesn’t need to be capitalized, but you should avoid label names that start with a capital L or a number for now. The label MY_STRING is used to represent the address in the computer that holds the string “hello world!n”. It’s a pointer to a string. So, yes regular c programmer’s ears are perking up. Yes, a pointer to a c string with all the implications that has. Now the question is how do you get it back to the c function main() that called assembleFunction()? The address that is contained in MY_STRING must now be placed into the general purpose register #3. The lines:

            addis	r3, 0, hi16(MY_STRING)
            ori		r3, r3, lo16(MY_STRING)

take care of this. “addis” stands for “Add Immediate Shifted”. “ori” stands for “OR Immediate”. We’re about to get technical, so close your eyes and skip down a couple of paragraphs to miss the ugly stuff.

PowerPCs (PPCs), such as Apples G3 boxes that run Mac OS X, have a CPU with 32 General Purpose Registers they are refered to as r0 – r31. We have been using General Purpose Register r3 for our returns from our functions. r3 – r10 are volatile and you can generally store what you want in them throughout the course of your function.

These registers can contain 32 bits each. That’s four bytes for those who are counting. If you have a totally fictitious address of 0x1234ABCD (written in hex) and this address happened to be the address stored in MY_STRING, AND you want to copy it to r3 to use as a pointer to your string, you need to copy it in two chunks. There are several ways to do this, but I’ve chosen the one I think is easiest to follow.

Using our fictional address as the example, addis r3,0,hi16(MY_STRING) copies the bytes 1234 into r3. r3 now contains the address 0x12340000. So we finish off by copying bytes ABCD into r3’s lower part with the command ori r3, r3, lo16(MY_STRING). r3 now contains the number/address 0x1234ABCD.

Finally we call our Break Link Register command (blr) that returns us to the main() function we came from with our precious cargo — a string pointer. In other words, the address in memory of the string we created in assembly.


Multiple Assembly Functions

What if you want to create an assembly source file that contains multiple functions? Here’s an example that shows two different strings being returned from two different functions in the same source file. Type them in and compile them as with the first example in this chapter.

File 1:



    #include <stdio.h>

    int main(void) {
        printf("%s", assembleFunction1() );
        printf("%s", assembleFunction2() );

        return 0;
    }

File 2:

    .cstring

    MY_STRING1:
            ; This a c string. When .asciz is
            ; used it silently adds the
            .asciz "Funky function ONEn"
    MY_STRING2:
            ; This another c string.
            .asciz "Funky function TWOn"

    .text

    .globl _assembleFunction1
    .globl _assembleFunction2

    _assembleFunction1:

            addis	r3, 0, hi16(MY_STRING1)
            ori		r3, r3, lo16(MY_STRING1)

            blr

    _assembleFunction2:

            addis	r3, 0, hi16(MY_STRING2)
            ori		r3, r3, lo16(MY_STRING2)

            blr

In this program we have two strings labeled MY_STRING1 and MY_STRING2. We also have two functions, assembleFunction1() and assembleFunction2(). The only thing new here is that the directive .asciz was used instead of .ascii. By using this, we did not have to put the null at the end of the string. It was done for us.


Putting It All Together

This sample combines all that we’ve covered so far. I thought that it might be useful for you to see it all in one example.

File one:

#include <stdio.h>

int main(void) {
    printf("%s", getStringOne() );
    printf("%d", getNumber() );
    printf("%s", getStringTwo() );

    return 0;
}

File two:

.cstring

FIRST:
    .ascii "Putting it all together with two strings and "
SECOND:
    .asciz " number.n"

.text

.globl _getStringOne
.globl _getNumber
.globl _getStringTwo

_getStringOne:

        addis	r3, 0, hi16(FIRST)
        ori		r3, r3, lo16(FIRST)

        blr

_getNumber:

        li	r3, 1
        blr

_getStringTwo:

        addis	r3, 0, hi16(SECOND)
        ori		r3, r3, lo16(SECOND)

        blr

Assignments


1) Write a program similar to the first example in this chapter from memory. You need to learn to do this sometime, and it will help you understand further concepts we discuss.

2) Try modifying the first example in this chapter to use the “.ascii” directive, but no terminating “”. What is the message you get when you try to compile.

3) Do the modification from assignment 2 to example 3 in this chapter. Does it compile? If so, what happens when you run the program? Try to explain the result.

4) Create a program that calls four different assembly functions from a c program. At least one must return a string and one must return a number. Run it and make sure it works.