Actions
=======

Not every action is permitted at every point in time. A matrix must be
constructed relating the actions currently in progress to the actions
that are permitted (partial list):

Load disallows cut, paste, tools, undo.
Save disallows cut, paste, tools, quit, undo.
Tool disallows undo, tools, show/hide markers.

Structure
=========

Sound is represented by the following structure:

sound[track[block[frame_count, 
                  peak cache, 
                  sample cache], 
            block[...]],
      track[...]]

That is, an N-channel sample is represented in memory as N tracks,
with each of those containing a number of variable size blocks that
contain the actual sample data. In addition to the sample data, these
blocks contain peak data for the samples in the sample cache. The peak
data simply consists of a high & low value for every 128 samples in
the sample cache. 

The API then decomposes into five layers:

1. sound layer (snd_*)
2. track layer (track_*)
3. blocklist layer (blocklist_*)
4. block layer (block_*)
5. cache layer (cache_*)

user ------> sound ----------+
 |             |             |
 |             v             v
 +---------> track ---> blocklist
               |             |
               |             v
               +---------> block
               |             |
               |             v
               +---------> cache


1. sound layer (snd_*)

The primary purpose of this layer is to convert between the
interleaved sound format as used by libaudiofile and sound devices to
the non-interleaved format that this API works with.

2. track layer (track_*)

The track layer provides a means to interface with the data stored in
the peak and sample caches. It's primary purpose is to present a
contiguous flat view of a track by stitching the constituent blocks
together as required. For sample caches this involves simple
concatenation, but for peak caches some extra work is necessary.

3. block layer (block_*)

A block is a thin wrapper around a peak cache and a sample cache. It
has the necessary knowledge to drive the cache layer, i.e. it knows
how to split and join peak caches, but very little else.

4. blocklist layer (blocklist_*)

Manages the blocklist and provides fast functions for mapping offsets
to blocks.

5. cache layer (cache_*)

A cache has no sound specific knowledge at all. It's job is simply to
store bytes and return them. There are two types of cache, either REAL
(i.e. real memory), or NULL, which is a special kind of cache that
takes up space but no memory and always returns zeroes (silence).

Peak cache
==========

When drawing a sample, we can make use of the fact that displays are
small relative to the size of an audio sample. That is, on a 1024x768
display, we only need to ever draw a maximum of 1024 samples, assuming
a scaling factor of 1:1. For a 16 bit stereo sample this is only 4K of
data, which takes very little time to process. But as the scaling
factor increases, the work quickly starts to overwhelms us (e.g. at a
scaling factor of 1:128 the number of bytes that we need to look at
for every redraw is already half a megabyte and the delays start to
become noticable).

Thus, in order to keep drawing quick at large scaling factors, we need
to somehow reduce to amount of work we need to do at drawing time. The
peak cache does this. The peak cache contains precalculated high,low
values for every 128 samples in the corresponding sample cache. 

When something needs to be drawn at a scaling factor below 128, we
derive the image from the sample cache, giving us the most accurate
picture. However, when something needs to be drawn at a scaling factor
of 128 or above, we derive the image from the peak cache, thus
reducing the amount of work 128-fold in exchange for slightly less
accuracy.

Now constructing the peak cache is easy because we can ensure that
every block (except the final one) has a frame count that is divisible
by 128. However because we may split blocks on non-128 sample
boundaries, we must be aware of the possibility that a single peak
cache element describes fewer than 128 samples. 

One consequence of this is that any peak data that we stitch together
from the peak cache may not exactly represent the underlying
samples. E.g. when you have two blocks chained like this:

 +---------------------+     +----------------------+
 | block 1, 64 samples |()-()| block 2, 300 samples |
 | 1 peak element      |     | 3 peak elements      |
 +---------------------+     +----------------------+

Then a request for the peaks of samples 128 - 384 will actually return
the peaks for samples 64 - 320 (peak elements 1 and 2 in block 2),
because a peak element, being just a high/low pair, cannot be further
broken down. Now the practical impact of the error is limited because
requests of only 256 frames are very rare. More typically, assuming on
a 1024x768 display, at a scaling factor of 1:128 (below this we don't
use the peak cache at all), the request will be 1024 * 128 = 131072
frames. An error of 127 frames (the maximum error) is then only a ~ 1%
error, and at higher scaling factors, the error becomes rapidly
smaller.

When splitting a block, if the split point is not divisible by 128,
you must recalculate the last element in the peak cache for the first
block, and recalculate the entire peak cache for the second block.

When joining two blocks, if the block 1 frame count is not divisible
by 128, then the final peak cache element of the first block is
discarded, the sample caches are joined, and the peak cache is
recalculated over the range (block 1 frame count) - (block 1 + block 2
frame count).

track/cache interaction notes
=============================

size_t
cache_fill(cache *c,
           void *src,
           size_t offset,
           size_t sz);

This function copies sz bytes from the buffer src to offset offset in
cache c.

void
cache_find(cache *c,
           void *dst,
           size_t *offset,
           size_t *sz);

This function copies sz bytes from offset offset in cache c to the
buffer dst. On return, offset and sz are set to the actual offset
where data was found (always larger than or equal to the requested
offset) and the actual number of bytes actually copied (always smaller
than or equal to the requested size).

void
track_cache_fill(track *tr,
                 void *bits,
                 AFframecount frame_offset,
                 AFframecount frame_count);

This function copies frame_count frames from the buffer bits into the
cache at offset frame_offset.

First, we need to find the cache block that stores the offset given by
frame_offset. We can do this by finding the first block, then checking
how many frames this block stores. If it stores more frames than our
frame_offset, we have found the correct block. Otherwise, we subtract
the block's frame_count from the required offset, and repeat the
procedure for the next block.

When the correct block has been found, we have a diminished
frame_offset that specifies the offset into the cache for the found
block, and an unchanged frame_count. We then call cache_fill with
these parameters (converted from frames to bytes), and it returns the
number of bytes actually written to the cache. We add this number to
the src pointer to get the proper offset into the source buffer,
convert it back to frames, and subtract it from the frame_count. The
frame_offset becomes 0. Then we get the next block and repeat the
procedure, until either there are no blocks left or the frame_count
has gone to zero. If the frame_count is non-zero and there are no
blocks left, then the cache is dropping frames for some reason and we
notify the user.

void
track_cache_find(track *tr,
                 void *bits,
                 AFframecount *frame_offset,
                 AFframecount *frame_count);

This function copies frame_count frames from offset frame_offset in
the cache into the buffer bits. On return, the frame_offset and
frame_count values specify the frame_offset and the frame_count that
still need to be filled (i.e. that could not be located in the cache).

Again, first we need to find the block that contains the frame
specified by the requested frame_offset, as above.

When the correct block has been found, we have a diminished
frame_offset that specifies the offset into the cache for the found
block, and an unchanged frame_count. We then call cache_find with
these parameters (converted from frames to bytes). 

Now the problem is that we need to return a contiguous block of data,
with the additional constraint that either the start of this block or
the end must be equal to either the requested start and the requested
end, respectively. That is, in the case of a partial cache miss, we
need the cache to either return data from the beginning (frame_offset
== new frame_offset, i.e. frame_offset unchanged), or up until the end
(new frame_offset - frame_offset + new frame_count ==
frame_count). This is because we need to be able to satisfy any
remaining data requirements in a single read; we cannot do that if we
need to fill the buffer "around" the data that was returned from the
cache.

So after calling cache_find, we need to check the returned
frame_offset. If the returned frame_offset is unchanged, then we know
that the bits buffer has been filled from the start. We can then add
the returned frame_count to the frame_offset, adjust the offset into
the bits buffer, subtract the returned frame_count from the
frame_count, and repeat the procedure until the cache returns either a
frame_count of zero (i.e. no more data in the cache) or a frame_offset
that is not equal to the frame_offset that was given (i.e. we stumbled
onto a gap in the cache).

Otherwise, if the frame_offset returned by cache_find() has changed,
then basically we cannot allow any more cache gaps or misses to occur
(or it would mean that the bits buffer is not either filled from start
-> ... or from ... -> end). So if the frame_offset has changed, we
check that the returned frame_offset - frame_offset + returned
frame_count equals either the requested frame_count (in which case we
are done), or the block frame_count (indicating that we reached the
end of the cache). In the last case, we add the returned frame_count
to the returned frame_offset, subtract the returned frame_count from
the frame_count, and repeat the procedure for the next block. If any
of the cache retrievals for any of the subsequent blocks fails,
(meaning the cache cannot fully satisfy a request), then the entire
request fails and we did all our work for nothing. C'est la vie.

