How tape loaders work

Introduction

This kind of continues from my parallel-port transfer article. It has a similar theme, that of data communication, but this time it's about how those ancient tape loaders worked on the ZX Spectrum and those other now emulated 8-bit machines. It focuses on how bytes of data were successfully and reliably stored and retrieved from audio tape.

How to store 1 bit on tape

The main thing you should know about audio tape is that a high or low signal can become reversed (don't ask me how, it just does). This means you can't rely on a high=1 and a low=0 signal, instead you must look for transitions where low-2-high or high-2-low occurs. Such a transition is called an 'Edge'. Two of these edges represent a 'Pulse' (a click on the tape).

       <-pulse->       <-pulse->       <-pulse->
hi     ЪДДДДДДДї       ЪДДДДДДДї       ЪДДДДДДДї   
       і       і       і       і       і       і 
       і       і       і       і       і       і
       і       і       і       і       і       і
lo   ДДЩ       АДДДДДДДЩ       АДДДДДДДЩ       А....
      edge    edge    edge    edge    edge    edge 

                   d               d               d
      time ---------------------->

We can only store data on an audio tape by means of these pulses. In order to distinguish between a '1' and a '0' we must vary the length of a pulse. So a '0' could be represented by, for example, 100 time ticks and a '1' could be represented by 200 time ticks.

      '0'                '1'                 '0'
   ЪДДДДДДДї       ЪДДДДДДДДДДДДДДї       ЪДДДДДДДї
   і       і       і              і       і       і
   і       і       і              і       і       і
   і       і       і              і       і       і    
 ДДЩ       АД.....ДЩ              АД.....ДЩ       АД...
   <--100-->       <-----200------>       <--100-->

The above shows how the length of a pulse can be used to denote a '0' (100 ticks) or a '1' (200 ticks). Now there is another important thing to remember here, it is possible that an audio tape can stretch over time. This means that when you come to read these pulses (by measuring the length of each click on tape) they might be shorter or longer due to this tape stretch effect and differences between each tape deck (a tape motor might run slightly slower or faster over time).

Time-outs

A tape loader works by measuring time. It sits in a loop and counts the number of iterations between the two edges. The loop also needs to handle time-outs. An obvious way to do this is by using an increasing or decreasing counter. The counter can be used for both measuring the length of a pulse and to check for a time-out. Each pass around the loop the counter is updated. If it reaches 0 then we have a time-out, so must jump to some default state and/or error routine.

The pulse measurement can (and must) handle more than just '0' and '1' encoding. It needs to handle 'Sync' and 'Header' pulses (possibly 'Trailer' ones too). All these need to be unique, which means their lengths must be easily distinguishable from each other. This can be done by doubling or halving the '1' and '0' pulse lengths respectively. One scheme could look like this:

           header pulse
  ЪДДДДДДДДДДДДДДДДДДДДДДДДДДДДї   
  і                            і
  і                            і
  і                            і
ДДЩ                            А....
  <------------400------------->
    sync              '1'                 '0'
   ЪДДДДї       ЪДДДДДДДДДДДДДДї       ЪДДДДДДДї
   і    і       і              і       і       і
   і    і       і              і       і       і
   і    і       і              і       і       і    
 ДДЩ    АД.....ДЩ              АД.....ДЩ       АД...
   <-50->       <-----200------>       <--100-->

The Spectrum recorded pulses in a equally measured 'cycles', where the 'hi' time span and the following 'lo' time span were identical in length. In short the pulse was kind of duplicated. Here is a scaled down example to demonstrate.

  <-----header-----><--sync--><------'0'------>
  ЪДДДДДДДДї        ЪДДДДї    ЪДДДДДДї   
  і  400   і        і 50 і    і 100  і 
  і        і        і    і    і      і
  і        і        і    і    і      і
ДДЩ        АДДДДДДДДЩ    АДДДДЩ      АДДДДДД
              400          50           100

As you might notice, the top and bottom are identical. This means we could take the top OR the bottom spans and use them to measure the pulses. In effect the pulses are recorded twice.

Detecting an 'Edge'

An 'edge' is a transition from one state to another. In other words, when something has changed. We can easily check for a change by using the nice XOR operation (which is an inverted bitwise compare, if you didn't know already).

     A       B    A xor B    not (A xor B)
   ДДДДД   ДДДДД  ДДДДДДД    ДДДДДДДДДДДДД
     0       0       0               1
     0       1       1               0
     1       0       1               0
     1       1       0               1

The above shows that a 'not (A xor B)' operation performs a bit compare, which is useful to know if you need to recolour an image stored as bitplanes. The normal 'A xor B' operation can be used to see if a bit has changed state.

To check for a change (an edge), we must first determine the state of something, in this example I will use the VR (vertical retrace) on the video card.

        mov     dx, 03DAh
        in      al, dx
        mov     ah, al          ; ah=inputb(0x3DA)

Now let's wait until the VR (#3) bit has changed using an XOR.

no_change:
        in      al, dx          ; read 3DAh port
        xor     al, ah          ; \ bitwise
        not     al              ; / compare
        test    al, 00001000b
        jnz     no_change       ; still the same?

Of course we can remove that 'not al' by inverting the 'jnz' condition.

no_change:
        in      al, dx          ; read 3DAh port
        xor     al, ah
        test    al, 00001000b
        jz      no_change       ; still the same?

A ZX Spectrum tape file.

A file consists of two sections, the 17-byte header block (this contained the filename, file size and type) and the data block (the actual data itself). Between the two sections is (like before EVERY tape block) a slight delay. This is needed for the loading machine to do certain house-keeping tasks, like comparing filenames, checking the file type and so on...

Each block had three parts to them, header pulses, a sync pulse and the bitstream (the '0's and '1's). The header pulses were made up from about 5 seconds worth of 2168 T-states ON then 2168 T-states OFF pulses. (A 'T-state' is the same as machine clock cycles.)

                   Header pulses

         2168            2168             2168
hi     ЪДДДДДДДї       ЪДДДДДДДї       ЪДДДДДДДї   
       і       і       і       і       і       і 
       і       і       і       і       і       і
       і       і 2168  і       і 2168  і       і 2168
lo   ДДЩ       АДДДДДДДЩ       АДДДДДДДЩ       А....

      repeat for about 5 seconds-------->

The Sync pulse was 667 T-states ON then 735 T-states OFF.

                   Sync pulse
         667           
hi     ЪДДДДДї
       і     і
       і     і      
       і     і 735
lo   ДДЩ     АДДДДДЩ

The data stream was made up from either '0' pulses (855 T-states ON, 855 T-states OFF) or '1' pulses (1710 T-states ON, 1710 T-states OFF) depending on each bit of each byte being stored on tape.

                   the bitsteam

       <--'0'-><--'0'-><-----'1'------>
        855     855      1710
hi     ЪДДДї   ЪДДДї   ЪДДДДДДДї   
       і   і   і   і   і       і 
       і   і   і   і   і       і
       і   і855і   і855і       і 1710
lo   ДДЩ   АДДДЩ   АДДДЩ       АДДДДДДД

      serial data---------->

It is important to note that the header pulses are longer than both the sync and bitstream pulses. This means a loader could not confuse them and try to use the 5-second header ('get ready for sync') pulses as data. The header pulses would cause a time-out because they are longer than any bitstream pulses.

The Loader

A loader is basically made up from three main loops. The first loop searches for the long header pulses. To make sure were are starting at the beginning of a tape file and not half way through the data bitstream. The border colour is toggled between red and cyan after each edge is found.

The second loop continues accepting header pulses but also looks for a sync pulse. The sync pulse is found by examining the time-out/counter value to see how long a pulse is.

The third loop again measures each pulse and determines if it represents a '0' or a '1'. The border colour is toggled between blue and yellow for each edge. This loop reads every bit of every byte and stores them in memory. Bits are written in the order 76543210.

If a time-out occurs within any of the above three loops then the whole process can be restarted.

Turbo (Nova) Loaders

Those of you who remember seeing these things will surely remember having to wait minutes for a small file to load from tape using the standard slow tape routines. The old Commodore 64 probably had the largest number of strange (but interesting) turbo-loaders around. A 'Turbo-loader' is basically a custom tape load routine which has it's pulses much shorter than a standard loader. This obviously means that a file will load much faster. In some case 4x normal speed.

But there is a price to pay for quicker (shorter) pulses; reliability. The cheap, standard audio tapes have a limited frequency range. So lots of high frequencies are lost when recording onto them. This is why Chrome and other audio tape materials were developed. By making the tape pulses shorter you increase the frequency of them, and a some tapes (and tape decks) can't handle these high frequencies. In fact quite a large number of software titles suffered from this problem. Some had to re-release a slower version, or place a slower loading version on Side B.

Multi-loaders and Blocks

Some of the later software broke a file down into parts (much like the Internet does now). Each part was given an ID like 37 of 50 and was loaded separately. This means if an tape glitch occurred the user only had to rewind the tape a short distance and reload the failed block. As most programs took 3 to 10 minutes to load this was a real life saver.

If you're developing some communication software then breaking a huge file down into smaller blocks is a good idea. For one you only need to resend those blocks which were corrupted and two if gives another small level of data integrity, you know that each block can only have between 1 and N bytes in it. If you receive a block bigger than your maximum packet size then you know there is a problem. Also, like the Internet, you can pass bi-directional information at the same time by using a read, write, read, write, etc... scheme. You can acknowledge that each block has been successfully received, or you can ask for a resend.

Developing your own Loaders

This isn't (or at least wasn't) as hard as it may first sound. The hardest part is generating a steady frequency on tape. On those 8-bit machines the CPU ran at a constant rate, so this was pretty easy. A simple delay loop was inserted between two OUT instructions to generate a pulse. Nowadays with different CPU speeds this isn't easy. Most of the emulators use the sound-card DMA to record or play at a specific, constant rate.

Firstly, don't jump in an try to do everything in one go. Take time to experiment and develop your loader and saver routines in small steps. Also don't be tempted to use very short delays, longer pulses are easier to work with. You can always speed up the routines after you have got them working!

(1) Write a 1000 pulses to tape using a HI, delay, LO, delay, HI delay etc.. loop.

(2) Play the tape back through the speaker. Can you hear anything? If not increase the delay until you can hear a distinct, steady square-wave tone.

(3) Now code a routine to read in 100 or so of these 1000 pulses. Look for an edge, count the number of iterations around the loop until you find another edge (i.e. measure the pulse). Store the time-out/counter values in memory and look at them. There will be slight variations, don't worry.

(4) Double the save routine delay and look at the resulting values using the step (3) method. You should now have a means of representing '0' and '1'.

(5) Modify the save routine to output a 1000 short ('0') then long ('1') pulses on tape (i.e. write 01010101010101010... binary to tape).

(6) Read and measure these pulses using step (3) and try to find a time-out/counter value which can reliably tell the difference between a '0' and a '1'. You may need to play around with the threshold value used for the CMP to decide between a '0' and '1'.

(7) Modify your save routine to write 100 even longer header pulses followed by 1000 '1010101010...' pulses.

(8) Try to find a value which can distinguish between the extra long header and the '1' pulses.

(9) Now try recording a text message onto tape. Write 100 header pulses followed by 8x length of text pulses.

(10) Tweak the threshold values and delays until you can reliably load that text message from tape.

Closing words

Ah well, I hope you have enjoyed this trip down memory lane. If you are interesting in tape loaders then kick your favourite Internet search engine and looks for ZX-spectrum or Commodore-64 emulators and documentation. No doubt there are plenty of turbo-loader source code files out there, it's just a question of finding them.

Happy tape loading!

TAD