NAME
    Apache::Filter - Alter the output of previous handlers

SYNOPSIS
      #### In httpd.conf:
      PerlModule Apache::Filter
      # That's it - this isn't a handler.
      
      <Files ~ "*\.blah">
       SetHandler perl-script
       PerlSetVar Filter On
       PerlHandler Filter1 Filter2 Filter3
      </Files>
      
      #### In Filter1, Filter2, and Filter3:
      my $fh = $r->filter_input();
      while (<$fh>) {
        s/ something / something else /;
        print;
      }
      
      #### or, alternatively:
      my ($fh, $status) = $r->filter_input();
      return $status unless $status == OK;  # The Apache::Constants OK
      while (<$fh>) {
        s/ something / something else /;
        print;
      }

DESCRIPTION
    Each of the handlers Filter1, Filter2, and Filter3 will make a call to
    $r->filter_input(), which will return a filehandle. For Filter1, the
    filehandle points to the requested file. For Filter2, the filehandle
    contains whatever Filter1 wrote to STDOUT. For Filter3, it contains
    whatever Filter3 wrote to STDOUT. The output of Filter3 goes directly to
    the browser.

    Note that the modules Filter1, Filter2, and Filter3 are listed in
    forward order, in contrast to the reverse-order listing of
    Apache::OutputChain.

    When you've got this module, you can use the same handler both as a
    stand-alone handler, and as an element in a chain. Just make sure that
    whenever you're chaining, all the handlers in the chain are "Filter-
    aware," i.e. they each call $r->filter_input() exactly once, before they
    start printing to STDOUT. There should be almost no overhead for doing
    this when there's only one element in the chain.

    Currently the following public modules are Filter-aware. Please tell me
    of others you know about.

     Apache::Registry (using Apache::RegistryFilter, included here)
     Apache::SSI
     Apache::ASP
     HTML::Mason
     Apache::SimpleReplace

METHODS
    This module doesn't create an Apache handler class of its own - rather,
    it adds some methods to the Apache:: class. Thus, it's really a mix-in
    package that just adds functionality to the $r request object.

    * $r->filter_input()
        This method will give you a filehandle that contains either the file
        requested by the user ($r->filename), or the output of a previous
        filter. If called in a scalar context, that filehandle is all you'll
        get back. If called in a list context, you'll also get an Apache
        status code (OK, NOT_FOUND, or FORBIDDEN) that tells you whether $r-
        >filename was successfully found and opened. If it was not, the
        filehandle returned will be undef.

        If for some reason you have already opened the filehandle you'll
        want to read from, call `$r->filter_input(handle='$handle)>, and
        `filter_input()' won't try to open any files. It will pass your
        handle back to you.

    * $r->changed_since($time)
        Returns true or false based on whether the current input seems like
        it has changed since `$time'. Currently the criteria to figure this
        out is this: if the file pointed to by `$r->finfo' hasn't changed
        since the time given, and if all previous filters in the chain are
        deterministic (see below), then we return false. Otherwise we return
        true.

        A caution: always call the `changed_since()' and `deterministic()'
        methods AFTER the `filter_input()' method. This is because
        Apache::Filter uses a crude counting method to figure out which
        handler in the chain is currently executing, and calling these
        routines out of order messes up the counting.

    * $r->deterministic(1|0);
        As of version 0.07, the concept of a "deterministic" filter is
        supported. A deterministic filter is one whose output is entirely
        determined by the contents of its input file (whether the $r-
        >filename file or the output of another filter), and doesn't depend
        at all on outside factors. For example, a filter that translates all
        its output to upper-case is deterministic, but a filter that adds a
        date stamp to a page, or looks things up in a database which may
        vary over time, is not.

        Why is this a big deal? Let's say you have the following setup:

         <Files ~ "\.boffo$">
          SetHandler perl-script
          PerlSetVar Filter On
          PerlHandler Apache::FormatNumbers Apache::DoBigCalculation
          # The above are fake modules, you get the idea
         </Files>

        Suppose the FormatNumbers module is deterministic, and the
        DoBigCalculation module takes a long time to run. The
        DoBigCalculation module can now cache its results, so that when an
        input file is unchanged on disk, its results will remain known when
        passed through the FormatNumbers module, and the DoBigCalculation
        module will be able to used cached results from a previous run.

        The guts of the modules would look something like this:

         sub Apache::FormatNumbers::handler {
            my $r = shift;
            $r->content_type("text/html");
            my ($fh, $status) = $r->filter_input();
            return $status unless $status == OK;
            $r->deterministic(1); # Set to true; default is false
            
            # ... do some formatting, print to STDOUT
            return OK;
         }
         
         sub Apache::DoBigCalculation::handler {
            my $r = shift;
            $r->content_type("text/html");
            my ($fh, $status) = $r->filter_input();
            return $status unless $status == OK;
            
            # This module implements a caching scheme by using the 
            # %cache_time and %cache_content hashes.
            my $time = $cache_time{$r->filename};
            my $output;
            if ($r->changed_since($time)) {
                # Read from <$fh>, perform a big calculation on it, and print to STDOUT
            } else {
                print $cache_content{$r->filename};
            }
            
            return OK;
         }

        A caution: always call the `changed_since()' and `deterministic()'
        methods AFTER the `filter_input()' method. This is because
        Apache::Filter uses a crude counting method to figure out which
        handler in the chain is currently executing, and calling these
        routines out of order messes up the counting.

HEADERS
    In order to make a decent web page, each of the filters shouldn't call
    $r->send_http_header() or you'll get lots of headers all over your page.
    This is so obvious that the previous sentence should be a lot shorter.

    So the current solution is to have _none_ of the filters send the
    headers, and this module will send them for you when the last filter
    calls $r->filter_input(). You should still set up the content-type
    (using $r->content_type), and any other headers you want to send, before
    calling $r->filter_input(). filter_input will simply call $r-
    >send_http_header() with no arguments to send whatever headers you have
    set.

    One downside of this is that all the filters in the stack will probably
    call $r->content_type, most of them for no reason, but say la vee. If
    anyone's got better ideas, don't hold them back.

NOTES
    You'll notice in the SYNOPSIS that I say `"PerlSetVar Filter On"'. That
    information isn't actually used by this module, it's used by modules
    which are themselves filters (like Apache::SSI). I hereby suggest that
    filtering modules use this parameter, using it as the switch to detect
    whether they should call $r->filter_input.

    VERY IMPORTANT: if one handler in a stacked handler chain uses
    `Apache::Filter', then THEY ALL MUST USE IT. This means they all must
    call $r->filter_input exactly once. Otherwise `Apache::Filter' couldn't
    capture the output of the handlers properly, and it wouldn't know when
    to release the output to the browser.

    The output of each filter (except the last) is accumulated in memory
    before it's passed to the next filter, so memory requirements are large
    for large pages. Apache::OutputChain only needs to keep one item from
    print()'s argument list in memory at a time, so it doesn't have this
    problem, but there are others (each chunk is filtered independently, so
    content spanning several chunks won't be properly parsed). In future
    versions I might find a way around this, or cache large pages to disk so
    memory requirements don't get out of hand. We'll see whether it's a
    problem.

    A couple examples of filters are provided with this distribution in the
    t/ subdirectory: UC.pm converts all its input to upper-case, and
    Reverse.pm prints the lines of its input reversed.

    Finally, a caveat: in version 0.09 I started explicitly setting the
    Content-Length to undef inside $r->filter_input. This prevents early
    filters from incorrectly setting the content length, which will almost
    certainly be wrong if there are any filters after it. This means that if
    you write any filters which set the content length, they should do it
    after the $r->filter_input call.

TO DO
    Add a buffered mode to the final output, so that we can send a proper
    Content-Length header. [gozer@hbesoftware.com (Philippe M. Chiasson)]

BUGS
    This uses some funny stuff to figure out when the currently executing
    handler is the last handler in the chain. As a result, code that
    manipulates the handler list at runtime (using push_handlers and the
    like) might produce mayhem. Poke around a bit in the code before you try
    anything. Let me know if you have a better idea.

    As of 0.07, Apache::Filter will automatically return DECLINED when $r-
    >filename points to a directory. This is just because in most cases this
    is what you want to do (so that mod_dir can take care of the request),
    and because figuring out the "right" way to handle directories seems
    pretty tough - the right way would allow a directory indexing handler to
    be a filter, which isn't possible now. Also, you can't properly pass
    control to a non-mod_perl indexer like mod_autoindex. Suggestions are
    welcome.

    I haven't considered what will happen if you use this and you haven't
    turned on PERL_STACKED_HANDLERS.

AUTHOR
    Ken Williams (ken@forum.swarthmore.edu)

COPYRIGHT
    Copyright 1998 Ken Williams. All rights reserved.

    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

SEE ALSO
    perl(1).

