Stacks, Pipes and Circular Buffers



These are the three most useful ways to use a block of memory as a buffer. Each of them have distinct advantages and disadvantages. In this article I hope to explain them and give a few tips along the way.

Although this article is aimed at newbies coder in general, more experienced coders may be find it worthwhile to refresh their memory.


This is probably the most popular way to use a block of memory. Ask any chip maker if they can find a use for stacks and they will laugh themselves silly. To say they are vital is a gross understatement.

I found the best way to describe and learn about stacks is to think of it as a pile ("a stack") of books, where each book is a data item which has been pushed onto the stack.

We can PUSH an item onto this stack of books simply by adding another book.

Or we can POP an item off the top of the stack by removing the top book.

A stack in a computer needs a "stack-pointer". This is a register which contains the memory address of the top of the stack in memory. As items are pushed onto the stack this "stack-pointer" decreases to reflect the new address of the top of the stack. And likewise when an item is poped off the stack, the stack-pointer is increased.

On the 80x86 the stack-pointer is represented by two registers, one for the segment and one for the offset. These registers are called SS and SP.

   Address        Value
 ------------    -------
  327A:1000       ----
  327A:1002       ----
  327A:1004       ----
  327A:1006       28A5     <------- SS:SP (the "stack pointer")
  327A:1008       7621
  327A:100A       1234              SS:SP = 327A:1006 hex
  327A:100C       8726

Values marked with "----" mean that they are unused and will be used later on as the stack expands downwards in memory when more items are pushed onto the stack.

The above diagram shows a stack with four items pushed onto it and the SS:SP registers which indicate where the top of the stack currently is.

Suppose we PUSH the value FF55 hex onto the stack. The stack-pointer SS:SP is decreased and the new value it written at this address.

   Address        Value
 ------------    -------
  327A:1000       ----
  327A:1002       ----
  327A:1004       FF55     <------- SS:SP (the "stack pointer")
  327A:1006       28A5
  327A:1008       7621
  327A:100A       1234              SS:SP = 327A:1004 hex
  327A:100C       8726                           ****

Now suppose we push another value, 0000, onto the stack.

   Address        Value
 ------------    -------
  327A:1000       ----
  327A:1002       0000     <------- SS:SP (the "stack pointer")
  327A:1004       FF55
  327A:1006       28A5
  327A:1008       7621
  327A:100A       1234              SS:SP = 327A:1002 hex
  327A:100C       8726                           ****

As you can see the stack "expands down". This means the stack-pointer SS:SP starts at the last byte in the buffer and grows downwards through memory as more items are pushed onto it.

Now let's POP some values off the stack. With the first pop we will get the value 0000 because this is the top most item on the stack. If we pop again then we get the value FF55, then 28A5, then 7621, etc.

Just like a pile of books you must take the top most from the pile. You CANNOT take the bottom book or the Nth book from the top.

The important thing to note about stacks is that they are FILO and LIFO buffers (First In Last Out) and (Last In First Out). These both mean the same thing. They're just a fancy way to say "like a pile of books".

Being FILO/LIFO structures does mean that the order in which items are pushed onto the stack must be reversed when poping them back off. For example, pushing the five items A,B,C,D,E onto the stack in this order you must pop them in this order E,D,C,B,A.


The stack is a one-ended buffer. This means items are put in and taken out from the same place (the top of the stack).

Pipes are two-ended buffer. The best way to think about a pipe is.. like a water pipe or a conveyor belt in a production line. As new items are put in at one end, a much older item is taken out of the other end. The 80x86 SHL and SHR instructions are examples of pipes, where each bit is moved along until the last bit falls out of the end.

Pipes are FIFO (First In First Out) or LILO (Last In Last Out) structures. If we used the pile of book example again, a pipe means that you can ONLY take the bottom book from underneath all the others.

Say we have 6 items labelled A, B, C, D, E and F inside a 5 item pipe.

         ----- ----- ----- ----- ----- -----          -----
           A     B     C     D     E     F    <----     X
         ----- ----- ----- ----- ----- -----          -----

Now we want to put a new item, X, into the pipe. We could shunt all the other items along in the pipe and put the new one into the end.

         ----- ----- ----- ----- ----- -----
     A     B     C     D     E     F     X
         ----- ----- ----- ----- ----- -----

As you can see the item (A) at the opposite end of the pipe was pushed out to make room for (X).

The advantage of a pipe over a stack is that items are stored and retrieved in the correct order. If we put in A, B, C, D, E then we get out A, B, C, D, E in this order. Using a stack we would get E, D, C, B, A the reverse order.

Speed: Moving the pipe.

There is a problem when using the pipe method, especially for large pipes with a great number of items inside, speed. The problem is that before we a put the new item into pipe we must move each and every old item down one position. As you can imagine if this pipe is 100 Kb or more this can take a long time, even when using REP MOVSB or other fast memory move techniques.

But the contents inside the pipe do not change, only the two ends of the pipe. We can use a similar technique to the stack-pointer except we need two pointers, one for the "head" (where new items will be put in) and one for the "tail" (where old items are taken out from).

         Tail                          Head
         ----- ----- ----- ----- ----- -----
           A     B     C     D     E     F
         ----- ----- ----- ----- ----- -----
 Address  100   101   102   103   104   105

In the above example Tail=100 and Head=105. Now suppose we wanted to insert the new item (X) at the head end. Instead of moving A, B, C, D, E, F we just update Tail and Head.

                Tail                         Head
         ----- ----- ----- ----- ----- ----- -----
           A     B     C     D     E     F     X
         ----- ----- ----- ----- ----- ----- -----
 Address  100   101   102   103   104   105   106

Now Tail=101 and Head=106. We can continue this for two more items (Y and Z):

                            Tail                         Head
         ----- ----- ----- ----- ----- ----- ----- ----- -----
           A     B     C     D     E     F     X     Y     Z
         ----- ----- ----- ----- ----- ----- ----- ----- -----
 Address  100   101   102   103   104   105   106   107   108

Now Tail=103 and Head=108.

But suppose we continue this method, eventually we will run out of memory or overwrite some important area of memory.

Circular Buffers

The solution is simple. Once you reach the end of your pipe buffer wrap around back to the start. This will always mean that other areas of memory will not be overwritten not matter how many items are inserted into the pipe.

If we take the last example and make 108 the end address of the buffer and 100 the start address then we can see that trying to add another item (W) it needs to be written at address 100.

         Head                    Tail
         ----- ----- ----- ----- ----- ----- ----- ----- -----
           W     B     C     D     E     F     X     Y     Z
         ----- ----- ----- ----- ----- ----- ----- ----- -----
 Address  100   101   102   103   104   105   106   107   108

This continues until the Tail also reaches the end of the buffer (108) at this point it too must be wrapped around back to address 100.

NOTE: An important thing to remember when using Pipes is that when calculating the number of items in the pipe from the Tail and Head pointers you must take into account the wrap around. This is a case of checking if Head < Tail and then adding the length of the buffer to Head before doing the calculation.

Quick Circular Wraps

Instead of using actual addresses for Tail and Head, indexes are often used from the start of the buffer. A power of two is normally used for the size of the buffer because the wrap around can then be done using a logical AND operation.

For example, a buffer size of 2^8 = 256 can be done using AND 255. Of course there is a much easier way to perform this in 80x86, using the built in MOD 256 operation of a byte variable.

An old technique which was used a great deal on the old Amiga and Atari ST was the overflow buffer copy method. This was often needed for scrolling the screen because the video hardware did not wrap around in a small 64Kb block of memory as the PC does. The technique used a double width buffer which contained an exact copy of the first half of it.

         <--1st half --> <--2nd half -->
         ------- -------
         -------- --------
 position 0 1 2 3 4 5 6   7 8 9 a b c d

The width of the screen is 7 characters (A,B,C,D,E,F,G). We start viewing the screen from position 0, so we see ABCDEFG. Then as we move right we see BCDEFGA (position 1..7). If we move right again we see CDEFGAB (position 2..8). Of course as we move by one position we update the right edge in both halves.

This continues until position = 7, at this point we see ABCDEFG (position 7..d) and so we can reset the position back to 0 to see ABCDEFG (position 0..6).

The nice thing about this technique is that you never need to perform the wrap around on the 2 index values, only on the start position. This was really important on the Amiga/ST because it was either impossible (the hardware didn't wrap) or too slow (an AND instruction took 4 Clock-cycles, on a 8Mhz machine that was sometimes just TOO much.

Fast Allocating/Freeing structures

This is a nice, quick trick told to me by a fellow coder (hi Phil).

Suppose you need to quickly allocate small structures and release them afterwards. And say within these small structures you had some static data which didn't change. The normally DOS/Windows allocation methods take a large amound of time and then you still have the problem of initialising the static parts of each and every structure.

The solution is to use a stack and pre-allocate all the structures you will need before you need them, initialising the static parts where needed. As each structure is allocated push it's address onto a small stack, this is just a list of structure addresses.

Now when you want to use one of these structures simply pop the top address from the allocation-stack.

Because you are not using DOS/Windows to allocate each structure and intialise the static parts of each structure this will be far, far quicker.

When building the allocation-stack you may wish to push 0000 as the first item to indicate when no more structures are available, or you could use a simple counter.

Recursion and "Fake-Stacks"

Sometimes you will need to use recursion. This is usually need for the classic flood-fill algorithm. The danger with recursion is that the stack isn't big enough and so crashes or system lock-ups can occur.

But in most/all situations a "fake-stack" can be used instead of calling the routine from within itself. This not only works out faster in most cases but uses less memory because you don't need the space for each and every return address. Also you don't need to rely on DOS/Windows or your assembler to allocate enough stack space because you can allocate it yourself and release it afterwards (something a recursive method couldn't easily do).

A stack is nothing more than an area of memory used as a buffer and a variable used as an address pointer to tell us where the next, new item should be written to.

Closing Words

Well, I hope newbie coders have understood this. Stacks are very useful structures, without them we would have no sub-routines, no interrupts and programs would be so big that not even M1cr0$s0ft could release them.

The circular buffer technique is used for many things, one of the best uses for it is for keyboard buffers or event queues inside interrupt handlers.

Good luck.


TAD #:o)