Overview
Functionality
Considerations and Limitations
Example
Output Code Details
Test Cases

Overview

Assembled Apple1 6502 opcodes are read as input and compatible C code that will run on Linux and Mac is output.

Functionality

Considerations and Limitations

The current implementation works more like emulation with how the CPU operates for arithmetic and flags. Future work could include analyzing the parsed instruction list and restructuring to produce a more abstract representation.

The intention was to translate the code and not to create run-time emulation. There is no support for run-time decoding of opcodes. Self-modifying code will not work.

Branching is supported using labels and goto. The JMP instruction uses relative addresses and the destination needs to be calculated so there is a switch structure to support jumping to labels based on the computed address:
  Lswitch:
      switch(lSwitchTarget) {
      case 0xFF00:    goto LFF00;
      case 0xFF02:    goto LFF02;
      ...
Instructions are loaded into memory and execution starts at 0xFF00. Currently the input must fit in 256 bytes. This results in some warnings where no C code is generated at that address and the values are loaded into memory:
  $ ./apple1 > wozmon.c
  Error unknown op code 0F at character 251 (position FFFB)
  Error unknown op code FF at character 253 (position FFFD)

Example

From https://skilldrick.github.io/easy6502/ under "Indexed indirect: ($c0,X)" there is an example program. Let's change one byte from 0a to 44 (from '\n' to 'D') and add comments:
  ; load 0x01 into X register
  LDX #$01

  ; load 0x05 into memory 0x0001
  LDA #$05
  STA $01

  ; load 0x07 into memory 0x0002
  LDA #$07
  STA $02

  ; load 0x44 ('D') into Y register
  LDY #$44

  ; store contents of Y ('D') into memory 0x0705
  STY $0705

  ; use indexed indirect addressing to fetch the 'D' into the A register
  LDA ($00,X)

  ; -- additional code added to print the contents of A to the screen --
  STA $D012

  ; stop
  BRK

After assembling, the hex-digit representation looks like:
  $ cat index-indirect.hex
  a201a9058501a9078502a0448c0507a1008d12d000

Compiling it:
  $ ./apple1 -input index-indirect.hex > index-indirect.c

  $ gcc index-indirect.c -o a.out

  $ file a.out
  a.out: Mach-O 64-bit executable x86_64

  $ ls -l a.out
  -rwx------  1 chad  chad  18840 26 Jan 09:26 a.out

Runing it:

  $ ./a.out ; echo
  D

Output Code Details

The core code in main() looks like a C representation of each opcode:
  LFF00:  fprintf(stderr, "LFF00\n");
          // LDX Immediate 01
          arg = 0x01;
          op_ldx(&state, arg);
          debug_print_state(&state);
  
  LFF02:  fprintf(stderr, "LFF02\n");
          // LDA Immediate 05
          arg = 0x05;
          op_lda(&state, arg);
          debug_print_state(&state);
  
  LFF04:  fprintf(stderr, "LFF04\n");
          // STA ZeroPage 01
          arg = 0x0001;
          op_sta(&state, arg);
          debug_print_state(&state);
  
  LFF06:  fprintf(stderr, "LFF06\n");
          // LDA Immediate 07
          arg = 0x07;
          op_lda(&state, arg);
          debug_print_state(&state);
  
  LFF08:  fprintf(stderr, "LFF08\n");
          // STA ZeroPage 02
          arg = 0x0002;
          op_sta(&state, arg);
          debug_print_state(&state);
  
  LFF0A:  fprintf(stderr, "LFF0A\n");
          // LDY Immediate 44
          arg = 0x44;
          op_ldy(&state, arg);
          debug_print_state(&state);
  
  LFF0C:  fprintf(stderr, "LFF0C\n");
          // STY Absolute 0507
          arg = 0x0705;
          op_sty(&state, arg);
          debug_print_state(&state);
  
  LFF0F:  fprintf(stderr, "LFF0F\n");
          // LDA IndirectX 00
          arg = (0x0000 + state.X) & 0xFF;
          arg = (memory_read(state.Mem, arg + 1) << 8) | memory_read(state.Mem, arg);
          arg = memory_read(state.Mem, arg);
          op_lda(&state, arg);
          debug_print_state(&state);
  
  LFF11:  fprintf(stderr, "LFF11\n");
          // STA Absolute 12D0
          arg = 0xD012;
          op_sta(&state, arg);
          debug_print_state(&state);
  
  LFF14:  fprintf(stderr, "LFF14\n");
          // BRK Implicit
          op_brk(&state);
          debug_print_state(&state);

The opcode helper functions look much like expected:
  void op_ldx(struct ComputerState *state, short int arg) {
          fprintf(stderr, "op_ldx(0x%02X)\n", (uint8_t) arg);
          state->X = arg;
          flag_update_nz(state, arg);
  }

Reading memory checks for the memory-mapped keyboard (and the ASCII high bit is always set):
  char memory_read(char mem[], short int addr) {
          usleep(10000);
          fprintf(stderr, "memory_read(0x%04X)\n", (uint16_t) addr);
          switch(addr) {
          case (short int) 0xD010:
                  if (kbhit()) {
                          mem[0xD010] = getchar();
                          if (mem[0xD010] >= 'a' && mem[0xD010] <= 'z') {
                                  mem[0xD010] = mem[0xD010] - ('a' - 'A');
                          }
                          mem[0xD010] |= 0x80;
                          // read input key \n as \r
                          if (mem[0xD010] == (char) 0x8A) {
                                  mem[0xD010] = 0x8D;
                          }
                  }
                  return mem[0xD010];
          case (short int) 0xD011:
                  // check for keypress
                  fprintf(stderr, "checking for keypress\n");
                  if (kbhit()) {
                          return 0x80;
                  }
                  return 0;
          case (short int) 0xD012:
                  return 0;
          default:
                  return mem[addr & 0xFFFF];
          }
  }

All input bytes are loaded into memory for programs that reference hard coded values using addresses:
  // initialize program memory
  state.Mem[0xFF00] = 0xA2;
  state.Mem[0xFF01] = 0x01;
  ...

Test Cases

There is a test_cases.txt file format that allows for easily adding new test cases by specifying the opcodes in hex and expected STDOUT output. Every test case is separately run to produce a .c file and that C code is compiled to an executable and run.
    # Blank lines and lines starting with '#' are ignored.
    #
    # Lines starting with:
    #
    #    "baseaddr " contain the base address of the test code
    #
    #    "name " contain the name of the test
    #
    #    "head " contain instructions to run before the body
    #    "tail " contain instructions to run after the body
    #    "body " contain instructions for the body of the test
    #
    #    "output" start the expected output
    #    "endoutput" end the expected output
    #
    # "head", "body", "tail" values from previous tests are carried over to
    # the next test.
    #
    # Instructions are in 6502 hex.  For example:
    #   "A942" means "LDA #$42"
    #
    # Instructions may be separated by spaces to make them easier to read. For example:
    #   "A942 6902" and "A9426902" both mean "LDA #$42, ADC #$02"

    ...

    # GOAL: JMP

    name JMP absolute
    body A941 8D12D0 4C0AFF 00 A942
    output
    AB
    endoutput

    name JMP indirect
    body A912 8576 A9FF 8577 A941 8D12D0 6C7600 00 A942
    output
    AB
    endoutput