It’s no surprise to any programmers reading this that occasionally we’re prone to burn out or the infamous “coder’s block” which reduces your creativity and motivation to basically zero.
I got stumped a few days back after posting some quick little articles based on some forum posts I answered. Feeling a bit bored with Gaming I surfed the web aimlessly until I came across this. I found the whole video fascinating since DOS was my first Operating System but I never encountered any of these Viruses in the past.
What followed was a refreshing change of pace as I searched the web for any and all information I could find about the viruses of yore. In doing so I rediscovered the various Zines from the old school hacking groups most notably 29A who were known for their sophisticated malware.
After spending a few days digging up the required information I decided to jump in and write my very own obsolete virus.
Since DOS is pretty much obsolete and a complete pain in the ass to get working on modern systems, emulation is our best bet. The obvious choice is DOSBox which is pretty will known and almost has 1-1 with true DOS support.
DOSBox should work straight up with zero configuration needed. If you have difficulty simply consult their documentation page.
Now we need some tools as we’re compiling code for an ancient operating system we can hardly fire up Visual Studio. Besides I wanted code my virus completely using DOS (for that old school feeling).
I managed to track down a copy of Turbo Assembler on this site. It also has some compilation instructions and a simple Hello World application nice!
From there it was simply a matter of mounting the directory containing the tools in DosBox:
mount c c:\Users\timb3r\DOS
Now we can cd to C Drive and access our goodies:
C:\ cd TASM\BIN
To make things easier I wrote a small BAT file that would compile our code with the correct settings:
@echo off cd C:\SRC\COM C:\TASM\BIN\TASM /m /l /zi %1.asm C:\TASM\BIN\TLINK /Tdc %1.obj DEL *.OBJ DEL *.MAP DEL *.LST ECHO.
Change the paths where appropriate when invoking pass the name of your source file on the command line:
The example program is designed to be compiled as an executable however we’re more interested in COM files. Use the code below to generate a properly formatted COM File:
CODE_SEG SEGMENT ASSUME CS:CODE_SEG ORG 100H START: jmp MAIN HelloMessage DB 'timb3r lives in 16-bit',0ah,'$' MAIN: mov ah,9 ;DOS print string function mov dx,OFFSET HelloMessage ;point to string int 21h ;display string mov ah,4ch ;DOS terminate program function int 21h ;terminate the program CODE_SEG ENDS END START
Looking inside COM files
If you compile the above and run it through a hex editor you can get an idea how COM files are structured:
COM files unlike modern formats like PE32/64 have no relocation information, because of this they are always loaded at offset 100h in memory. Since DOS lacks any type of multi-threading typically only one file may run at a time (but not always). DOS also lacks any type of memory protection and has no security whatsoever meaning any program that runs has unfettered access to any memory it likes.
Probably horrifying to the younger people reading this but old school computing didn’t typically have any security whatsoever. The security was the user. You were just expected to know what you were doing.
Once our COM file is loaded into memory the processor begins executing from this address which means it executes the instruction EB 19 which is a short jump to a program start.
How they worked
Typically back then Viruses would append themselves to the end of the file then overwrite the jump with the address of the program start of the virus. The virus could either completely take control of the program or hand control back when it was finished. This was accomplished with a bit of clever maths and a far jump instruction E9 XX XX.
Even sneakier viruses would remove all traces of themselves from memory prior handing control back. That way neither the host program or the operating system would know they were there.
This is also how Executable infectors worked as well, by locating the entry point of the application and modifying it with the Virus’s own entry point. However since Executable files were a touch more complicated most viruses targeted com files.
Designing our own infector
Our infector will be very similar to the one mentioned above. Our program’s logic will be as follows:
- Find the first com file in the current directory.
- Attempt to open it.
- Read the first 4 bytes of the file.
- Check for our virus signature, if it has a signature the file is already infected.
- If not we append ourselves to the end of the file.
- Update the JMP instruction with a FAR jump to our program start.
- Keep searching for other COM files.
- If our code is inside a host program we return control back to the program.
Pretty simple right? However there is something we’re overlooking here. This is DOS we can’t just printf to screen, we’re going to have to write our own functions for EVERYTHING.
Interfacing with DOS
Modern Operating Systems have an extensive API for doing things like Opening files or reading input: DOS has interrupts. What’s an interrupt? Exactly what it sounds like! It interrupts execution of the current program so execution can take place somewhere else.
The most common interrupt is 21h. This interrupt is the way in which our program communicates with the operating system. Here’s a fairly complete list of available DOS functions. I can already hear you thinking it, that’s it. That’s everything DOS provides to a program: a couple of file IO functions and reading and writing to screen that’s it.
But that’s all we need right now.
Printing to screen
Looking at the example program you see how this call works. AH is set to 9 which tells DOS we want to write something to to screen. DX contains a pointer to our ASCIIZ string. Then we call int 21h and write the text to the screen.
We’re probably going to be doing this a lot inside our program since we can’t trace our code with a debugger. We’re going to need two functions:
- One to display a message.
- One to dump out the value of a register.
The first one is simple enough however the second one gets complicated very quickly. If your programming maths is a bit rusty you may want to go brush up on it first because we’re talking nibbles, bytes and words my friends.
; Print string to screen ; si = string to print ; cx = print new line (0/1) Printf proc ; Save our registers push ax push bx ; Setup for int 10h call mov ah, 0eh xor bx, bx display_char: ; Load first byte lodsb ; Print to screen using int 10h int 10h ; No more bytes? test al,al ; Loop or break if done jne display_char ; Print newline? cmp cx, 1 jne @ ; Yes, print CRLF mov dl, 0ah mov ah, 02h int 21h @: ; Restore our registers pop bx pop ax ret endp Printf
Now we can easily invoke our code like so:
szMsg DB "This is a message", 0, "$" mov cx, 1 lea si, offset szMsg call Printf
That was easy! Now for the hard part: dumping a register.
szHex db "0123456789ABCDEF",0 ; Dump AX register to screen in hex PRINT_AX proc pushf push ax push dx mov cx, 2 _next_byte: push ax push cx dec cx shl cx, 3 shr ax, cl and ax, 0ffh call _PRINT_AL pop cx pop ax dec cx jnz _next_byte mov dl, 0ah mov ah, 02h int 21h pop dx pop ax popf ret endp PRINT_AX _PRINT_AL proc push bx mov bl, al shr al, 4h call _PRINT_NIBBLE mov al, bl and al, 0fh call _PRINT_NIBBLE pop bx ret endp _PRINT_AL _PRINT_NIBBLE proc push si push ax push di push dx lea si, offset szHex add si, ax mov dl, [si] mov AH, 02h int 21h pop dx pop di pop ax pop si ret endp _PRINT_NIBBLE
Now to invoke simply move the required value into the AX register:
mov ax, 101h call PRINT_AX
Phew! That’s enough for now.
In the next article we’ll go into the actual infection process in depth.