Apple1 to C compiler

Overview
Functionality
Considerations and Limitations
Example
Output Code Details
Test Cases

Overview

Assembled Apple1 6502 opcodes are read as input and compatible C code that will run on Linux and Mac is output.

Functionality

Opcodes in hex are parsed into a list of struct type ParsedInstruction.
The parsed instruction list is iterated and produces C code.
There are helper functions and a ComputerState struct that holds the registers and memory contents.

Considerations and Limitations

The current implementation works more like emulation with how the CPU operates for arithmetic and flags. Future work could include analyzing the parsed instruction list and restructuring to produce a more abstract representation.

The intention was to translate the code and not to create run-time emulation. There is no support for run-time decoding of opcodes. Self-modifying code will not work.

Branching is supported using labels and goto. The JMP instruction uses relative addresses and the destination needs to be calculated so there is a switch structure to support jumping to labels based on the computed address:

  Lswitch:
      switch(lSwitchTarget) {
      case 0xFF00:    goto LFF00;
      case 0xFF02:    goto LFF02;
      ...

Instructions are loaded into memory and execution starts at 0xFF00. Currently the input must fit in 256 bytes. This results in some warnings where no C code is generated at that address and the values are loaded into memory:

  $ ./apple1 > wozmon.c
  Error unknown op code 0F at character 251 (position FFFB)
  Error unknown op code FF at character 253 (position FFFD)

Example

From https://skilldrick.github.io/easy6502/ under "Indexed indirect: ($c0,X)" there is an example program. Let's change one byte from 0a to 44 (from '\n' to 'D') and add comments: ; load 0x01 into X register LDX #$01 ; load 0x05 into memory 0x0001 LDA #$05 STA $01 ; load 0x07 into memory 0x0002 LDA #$07 STA $02 ; load 0x44 ('D') into Y register LDY #$44 ; store contents of Y ('D') into memory 0x0705 STY $0705 ; use indexed indirect addressing to fetch the 'D' into the A register LDA ($00,X) ; -- additional code added to print the contents of A to the screen -- STA $D012 ; stop BRK After assembling, the hex-digit representation looks like: $ cat index-indirect.hex a201a9058501a9078502a0448c0507a1008d12d000 Compiling it: $ ./apple1 -input index-indirect.hex > index-indirect.c $ gcc index-indirect.c -o a.out $ file a.out a.out: Mach-O 64-bit executable x86_64 $ ls -l a.out -rwx------ 1 chad chad 18840 26 Jan 09:26 a.out Runing it: $ ./a.out ; echo D Output Code Details The core code in main() looks like a C representation of each opcode: LFF00: fprintf(stderr, "LFF00\n"); // LDX Immediate 01 arg = 0x01; op_ldx(&state, arg); debug_print_state(&state); LFF02: fprintf(stderr, "LFF02\n"); // LDA Immediate 05 arg = 0x05; op_lda(&state, arg); debug_print_state(&state); LFF04: fprintf(stderr, "LFF04\n"); // STA ZeroPage 01 arg = 0x0001; op_sta(&state, arg); debug_print_state(&state); LFF06: fprintf(stderr, "LFF06\n"); // LDA Immediate 07 arg = 0x07; op_lda(&state, arg); debug_print_state(&state); LFF08: fprintf(stderr, "LFF08\n"); // STA ZeroPage 02 arg = 0x0002; op_sta(&state, arg); debug_print_state(&state); LFF0A: fprintf(stderr, "LFF0A\n"); // LDY Immediate 44 arg = 0x44; op_ldy(&state, arg); debug_print_state(&state); LFF0C: fprintf(stderr, "LFF0C\n"); // STY Absolute 0507 arg = 0x0705; op_sty(&state, arg); debug_print_state(&state); LFF0F: fprintf(stderr, "LFF0F\n"); // LDA IndirectX 00 arg = (0x0000 + state.X) & 0xFF; arg = (memory_read(state.Mem, arg + 1) << 8) | memory_read(state.Mem, arg); arg = memory_read(state.Mem, arg); op_lda(&state, arg); debug_print_state(&state); LFF11: fprintf(stderr, "LFF11\n"); // STA Absolute 12D0 arg = 0xD012; op_sta(&state, arg); debug_print_state(&state); LFF14: fprintf(stderr, "LFF14\n"); // BRK Implicit op_brk(&state); debug_print_state(&state); The opcode helper functions look much like expected: void op_ldx(struct ComputerState *state, short int arg) { fprintf(stderr, "op_ldx(0x%02X)\n", (uint8_t) arg); state->X = arg; flag_update_nz(state, arg); } Reading memory checks for the memory-mapped keyboard (and the ASCII high bit is always set): char memory_read(char mem[], short int addr) { usleep(10000); fprintf(stderr, "memory_read(0x%04X)\n", (uint16_t) addr); switch(addr) { case (short int) 0xD010: if (kbhit()) { mem[0xD010] = getchar(); if (mem[0xD010] >= 'a' && mem[0xD010] <= 'z') { mem[0xD010] = mem[0xD010] - ('a' - 'A'); } mem[0xD010] |= 0x80; // read input key \n as \r if (mem[0xD010] == (char) 0x8A) { mem[0xD010] = 0x8D; } } return mem[0xD010]; case (short int) 0xD011: // check for keypress fprintf(stderr, "checking for keypress\n"); if (kbhit()) { return 0x80; } return 0; case (short int) 0xD012: return 0; default: return mem[addr & 0xFFFF]; } } All input bytes are loaded into memory for programs that reference hard coded values using addresses: // initialize program memory state.Mem[0xFF00] = 0xA2; state.Mem[0xFF01] = 0x01; ... Test Cases There is a test_cases.txt file format that allows for easily adding new test cases by specifying the opcodes in hex and expected STDOUT output. Every test case is separately run to produce a .c file and that C code is compiled to an executable and run. # Blank lines and lines starting with '#' are ignored. # # Lines starting with: # # "baseaddr " contain the base address of the test code # # "name " contain the name of the test # # "head " contain instructions to run before the body # "tail " contain instructions to run after the body # "body " contain instructions for the body of the test # # "output" start the expected output # "endoutput" end the expected output # # "head", "body", "tail" values from previous tests are carried over to # the next test. # # Instructions are in 6502 hex. For example: # "A942" means "LDA #$42" # # Instructions may be separated by spaces to make them easier to read. For example: # "A942 6902" and "A9426902" both mean "LDA #$42, ADC #$02" ... # GOAL: JMP name JMP absolute body A941 8D12D0 4C0AFF 00 A942 output AB endoutput name JMP indirect body A912 8576 A9FF 8577 A941 8D12D0 6C7600 00 A942 output AB endoutput