


##########################################################
> OK, if this is the case, I'd prefer to have only one variable type -- an
> array - so $r->dir_config will always return a list. So if it's a sigular
> value, it'll be the first element. If you want it to be a hash -- you can
> have it as well. This makes thing simple and even eliminates the need for
> PerlSetVar, you can have only PerlAddVar for everything. 

my @values = $r->dir_config->get('Key');
and
my %hash = $r->dir_config->get('Key');

already does that, regardless of using PerlSetVar or PerlAddVar.
although, PerlSetVar would end up with odd number of elements for %hash.



##########################################################

This goes to debug

Doug said:


> I recently downloaded Apache_1.3.12 and installed it on Redhat 6.1,
> everything was working fine. I run into problems when I tried to install
> mod_perl-1.23. Everything was compiled ok, but I got error messages (see
> below) when I try to run the make test. I'm new to both Apache and mod_perl,
> thank you for your help in advance. 

> still waiting for server to warm up...............not ok

hmm, what happens if you run:

% make start_httpd
% GET http://localhost:8529/perl/perl-status
% make kill_httpd

?

you can also get more details on why apache isn't reponding like so:

% make start_httpd
% strace -f -s1024 -o strace.out -p `cat t/logs/httpd.pid`
% make run_tests
% make kill_httpd

have a look at strace.out

The above is a much better example of starting strace without using
hardcoded names.

##########################################################
add this to performance.pod


Generating proper headers so the documents, that aren't getting
modified frequenlty, could be cached by proxies, browsers and other
Internet caches, will allow you to substantionally reduce the load on
your machines. Again if the content that you provide shouldn't be
unique for every request.

More info in correct_headers.pod



##########################################################
add this to performance.pod

=head1 Code Unloading

We urge you to preload as much code as possible, as it helps to 
increase the amount of memory which is shared and so reduces the 
memory footprint.  But sometimes we want to unload code that was 
previously loaded.  For example, you could load many modules to do 
some configuration or initialization work at server startup, but 
none of the children will need these modules later.  As the code 
is no longer needed, you can unload it.

For example if you use C<XML::Parser in a C<<Perl>> section only, you
could remove it with:

  delete $INC{'XML/Parser.pm'};
  Apache::PerlRun->flush_namespace('XML::Parser');



##########################################################

One day this should become a nice intro into the Perl
sharing/Multithreading stuff. I've added the section below but then
removed it, since Shane's reply includes flaws. You don't want to have
the number of active processes as the number of processors, because
CPU is not the only resource the process consumes. If we let a single
process possess a CPU and it goes to sleep on a lengthy IO from Disk
or db query, the CPU cycles will be wasted.

=head1 Multithreading or not Multithreading

I want to quote this question that showed up and the mod_perl list and
the answer in their entireness. By the way, this question has been
asked before perl5.6 and apache2.0 have been released (both with
multithread support).

John Henckel has asked:

Thanks for your help.  I am slowly coming to terms with the fact that
Perl is not multithreaded and so it will never be able to scale the
way I need it to.  It is only a couple thousand lines of code.  I will
have to rewrite it as a servlet in Java or else a C program using the
FastCGI protocol.

Many people thing FCGI can't handle multiple concurrent requests,
however, that is only a perl limitation.  The C-FCGI interface, with
multithreading, can process hundreds of CGI request simulataneously in
ONE process with ONE socket to the Apache server.

Shane has replied:

Arg...!  Okay, lets backup.  Multithreading is not a good idea.  What
your talking about is a process which takes 4 seconds to complete...,
what are we talking about, remote system communication?  You have a
process which takes 4 seconds to finish, it doesn't matter what you
use as your design... it still takes 4 seconds.  If you have 100
requests for a 4 second process... its going to take 400 seconds.
(Actually, due to context switching, probably more like 500)

Multi-threading ADDs, not subtracts from the load.  But you see it all
depends on what sort of architecture your going to use too.  If you
going to use a single processor machine, multithreading is going to
slow the whole process down.  If your using a computer that has 100s
of processors, then clearly, multithreading is the approach to take.

The perl "limitation" of which is you speak is not a limitation per
se.  Its a limitation if your only using one instance of the perl
interpretor.  But you are free to use more than one instance of that
interpretor.  (perldoc perlembed) However, it makes zero sense to have
more interpretors than processors... due the context switching issue.

The best way to solve the problem is to have one "processing" thread
per processor.  (I.e. the thread that does work on the request that is
supposed to take 4 seconds) That thread can be directly written in c,
or you can write some code for a perl interpretor to process.  No big
deal.., just keep in mind, if you start opening up more processing
threads than processors you have and start cramming requests down
throat faster than once per four seconds, the things going to tumble
like dominos.  The best thing you could possibly do is this: Setup an
engine of your own to handle this 4 second long process.  Initiate as
many threads as you have processors.  Start a queue of processes that
can be however long.  Hand out 4 second long processes to the queue.

This design will keep things from crashing.  Even better would be to
have a series of computers that grab requests from others.  I.e. setup
a central thread written in c, then have it act as a queue.
Processing threads grab new requests from that queue, and deliver
them.  This is of course based on your effort of running a process
that takes 4 seconds.

If this is about remote communication, which I have no idea why it
wouldn't be unless your doing some strange number crunching, then what
you want to do is read up on select(), and poll().  Or if you want to
have it work really well, read up on rt signal queues, and run on the
2.3.x linux kernel, or some other unix variant.

I'll reiterate... having more threads than processors in ANY language
is a bad idea, if it can be avoided.  Which it can if you use the
newer programming stuff.  Moving to Java will not solve your problems,
it will create 100x worse ones in terms of performance.  Moving to c
will not solve your problems.  Moving to Perl will not solve your
problem.  Investigating your problem clearly will help eventually
solve your problem.  (BTW: No CGI request should take 4 seconds to
process, unless you are querying a database on the other side of the
planet.  Unless its some sort of mathematics lab searching for prime
numbers or something.  I might sound like I'm joking..., but I most
certainly am not.)




########################################### 
An attempt to explain why memory goes unshared:

From: Stas Bekman <sbekman@stason.org>

The Perl is a language that uses weak data types, i.e. you don't
specify variable size (type) like you do in the strong typed languages
like C.  Therefore Perl uses heap memory by allocating memory on
demand, rather (unmodifiable) text and (modifiable) data memory pages,
used by C.  The latter get allocated when the program is loaded into
memory before it starts to run, the former is allocated at the
run-time.

On heap there is no separation for data and text pages, when you call
malloc it just allocates you a chunk you have asked for, this leads to
the case where the code (static in most cases) and the variables
(dynamic) land on the same memory pages.

From: shane@isupportlive.com


Well... the structs for the "Code Values" are mixed with both code,
and variables in the case of Lexicals.  However, during runtime these
are not altered.  Their are structs within the main code value struct
that will be altered during run time, namely the recursive lexical
array.  (That's what I call it, but I'm sure Malcolm uses a more
corect word :->) However, the actual Code (a series of highly
optimized opcodes... instruction set workalike type stuff that take a
few clockcycles each to do depending on the op and the architecture)
is not memory inline with the structs for the recursive lexical array.

What I'm saying is that if you include all your code at the very
begining the design of perl will not alter that code, so it should be
allowed to be fixed and shared.  Basically it just holds an opcode
pointer as to which opcode its working at within the CV.  The
recursive lexical array itself just has a pointer within the "code
value" struct to itself.  So basically that main Code struct should
never need to be realloc'd so it's fairly unlikely that it would need
to be non-shared.  However maybe someone that understands how
something is "unshared" within the kernel could be quite helpful.  If
you were to change where something was pointing within a struct would
that cause it to be unshared?  I think that it's fairly unlikely, but
I suppose it's possible.  If that's the case then it's quite likely
that code pieces could become unshared I suppose.  However the main
hunk of actual function opcodes would remain fixed, only the execution
pointer (where it's pointing at within the present program) would
change.  So, in final (!) the code should always be shared.  However
if you change the file and it checks the date on it and reloads it,
obviously it won't be shared :-).

> The Perl is a language that uses weak data types, i.e. you don't
> specify variable size (type) like you do in the strong typed
> languages like C.  Therefore Perl uses heap memory by allocating
> memory on demand, rather (unmodifiable) text and (modifiable) data
> memory pages, used by C.  The latter get allocated when the program
> is loaded into memory before it starts to run, the former is
> allocated at the run-time.

Yes that's true.  There is some compile time stuff where it organizes
the variable names within the lexical array, but I'm not sure whether
or not it actually reserves space for those things at that time.  I'm
really not sure about that item.

> On heap there is no separation for data and text pages, when you
> call malloc it just allocates you a chunk you have asked for, this
> leads to the case where the code (static in most cases) and the
> variables (dynamic) land on the same memory pages.

This is a weak area in my knowledge.  I'm not certain how the kernel
actually marks segments as shared and not... so I'll refrain from
commenting.
 
> I'm not sure this a very good explanation. Perl gurus are very
> welcome to correct/improve my attempt to explain this.

I've tryed to explain what I can.  The best book for this is "Advanced
Perl Programming"... published by Oreilly.  (of course) There is a
chapter in there written by Malcolm Beatie (well pieces of the
chapter) that are pretty good..., but I'm afraid they might not go
into enough depth on these exact issues.  Not only that, but these are
also very kernel related too..., you have to understand how both
pieces fit together, and frankly I couldn't answer that, and I don't
know a person alive that could :-).  (I'm sure there are some, but
who?)


########################################### 

Ged has written this:

"=head2 URI Translation

For many reasons, a server can never allow access to its entire
directory hierarchy.  Although there is really no indication of this  
given to the Web browser, the path given in a requested URI is   
therefore a virtual path, and early in the processing of a request the
virtual path given in the request must be translated to a path rooted
at the filesystem root so that Apache can determine what resource is
really being requested.  This path can be considered to be a physical
path, although it may not physically exist.

Especially in mod_perl systems, you may I<intend> that the translated
path does not physically exist, because your module responds when it
sees a request for this non-existent path by sending a virtual
document.  It writes this document "on the fly", especially for this
request, and the document then vanishes.  Many of the documents you   
see on the Web, for example most documents which change their
appearance depending on what the browser asks for, do not physically
exist.  This is one of the most important features of the Web, and it
is one of the great powers of mod_perl that it allows you complete
flexibility to create virtual documents."


He also said:
> Anyway, I think it needs to be said somewhere, and
> preferably so that the reader will see it before he sees things like  
> the Alias directives.  It kinda gets assumed by all the other docs and
> newbies flounder around without grasping the concept.  I speak from
> personal experience!


###########################################

> > META: Why cover IO::File when Symbol is quicker & lighter?
> >
> > It provides other functionality as well. Do you think that it might be
> > better just to mention it and not delve into the details?
> 
> Either just mention it or alternatively, put Symbol first (with note
> regarding 5.6.0) and then follow with IO::File with a leading remark
> like "In some situations you may want to take a fully object oriented
> approach to file handling." And then have the IO::File stuff.

###########################################
You must rewrite the 

performance.pod
=head2 Preload Perl modules - Real Numbers

It's a crap!!!

First define the goal of testing and probably using GTop to make the
process easier.

###########################################

debug.pod:

Collect all these notes about stack traces generation (here in this file)

the stacktrace looks right.  it would be more useful to see the line
number, which you can see if you follow this tip for building mod_perl 
from the SUPPORT doc:

=item CORE DUMPS

If you get a core dump, please send a backtrace if possible.
Before you try, build mod_perl with perl Makefile.PL PERL_DEBUG=1
which will:
 -add `-g' to EXTRA_CFLAGS
 -turn on PERL_TRACE
 -set PERL_DESTRUCT_LEVEL=2 (additional checks during Perl cleanup)
 -link against libperld if it exists

-----------------

Apache::DB/ httpd -X -D DEBUG

> if you set OPTMIZE => '-g', in the Makefile.PL and start httpd under gdb,
> it's easy to debug.




###########################################

eagle book appendix B lists all the Makefile.PL options (we better
include them all)

###########################################

A reader suggested:

> It might be helpful to replicate some information about the the Apache Life
> Cycle ( Eagle book pg 56 ), before talking about server startup file.

config.pod

###########################################

debug: =head3 Safe Resource Locking

You must review the section and correct this issue:

If the file is reopened in the next script invocation, the previous fh
will be closed and unlocked, but only from within the same process. 

Regarding leakages, if you use open IN, ... probably there is no
leakage because it's the same handler. In case of gensym or IO::File
use you should check this issue.

All the above is tentative and should be validated.

it seems to be duplicated as well in a few places.

I think that the Symbol is being duplicated in two places!

###########################################

From: "Joseph R. Junkin" <jjunkin@datacrawler.com>
Subject: Re: [summary+rfc] When One Machine is not Enough...

I doubt it will help, but you are free to look at a talk I gave:

Analysis of the Open Source Application Platform
http://www.datacrawler.com/talk/tech/platform/

It addresses the scalability issues you have already covered.


###########################################

die() issue:

merge the 
porting.html#die_and_mod_perl
snippets.html#Redirecting_Errors_to_the_Client

add notes from Matt's email: 
Warning: $SIG{__DIE__} considered dangerous
regarding of eval {} if $@ try/catch use

###########################################

Makefile.PL: build.pl gets called twice when running from CPAN
shell. Because it does 'make' and 'make install' and both call
'manifypods'. Need to add a source change control, so if manifypods
was called once for make, it won't be called for 'make install'


###########################################
(merge of status.pod and debug.pod)

I think of merging the Apache::Status and the Debug sections since
both are very relative and Apache::Status allows you to debug the code
in some extend.


#####################################################################

Stick this para at the beginning of the performance chapter.

One of the most important issues in improving performance is the
reduction of memory usage.  The less memory each server uses, the more
server processes you can start, and thus the more performance you have
(from the user's point of view, the speed of response).

See L<Global vs Fully Qualified
Variables|performance/Global_vs_Fully_Qualified_Variab>

See L<Memory "leakages"|performance/Memory_leakage>

#####################################################################


Important for both book and guide: The strategy chapter talks about
performance improve among other things. The performance chapter
doesn't mention it (refer to it). But this is a very important part of
it.

#####################################################################

Include very important performance improve notes from:

http://www.apache.org/docs/misc/perf-tuning.html
http://www.apache.org/docs/misc/perf.html

#####################################################################

add benchmarks with keep-alive and without them!


#####################################################################

Notice this package in debug section.

Devel::Symdump - dump symbol names or the symbol table

Apache::Symdump - shows 

Apache::Status uses it to show the process' internals


#####################################################################

Describe the BackLog (performance...)

On that note you might want to set the BackLog parameter (I forget the precise
name), it depends on whether you want users to wait indefinitely or just get
an error.




#####################################################################

> > What is the best way to have a Location directive apply to an entire
> > site except for a single directory?
> 
> Set the site-wide handler in a <Location "/"> and override the handler
> for the "register" dir by setting the default handler in <Location
> "/register">.  Unfortuntaly, I don't know the name of the default
> handler.

   SetHandler default-handler



#####################################################################

META: add a section about setting and passing environment variables:
It includes and merges (PerlSetVar, SetVar and Pass*), %ENV, (creating
your own directives?), subprocess

Notes:


* I'd suggest using $r->subprocess_env() instead.
I guess %ENV will work in many situations, but it might bite you later
when you can't figure out why a particular env variable isn't getting set
in certain situations (speaking from experience).


* I was going to suggest that too.  %ENV controls the environment
of the currently running Perl process, but child processes come from
the "subprocess env", which only the call above sets.

#######################################################################

Add a MOD_PERL_TRACE=all example...

An email:

> > > Any suggestions?  How might I debug this?
> > 
> > hmm, can you put a warn() trace in your sub SiteMap, I wonder if it's
> > called the first time, but util.pm is not reloaded when Apache restarts
> > itself on startup.  
> > any difference if you turn Off PerlFreshRestart?
> > is mod_perl configured as a dso or static?
> > 
> > -Doug
> 
> mod_perl is static (my initial message included commands I used to build
> mod_perl/apache).
> 
> PerlFreshRestart Off  has no effect.
> 
> It does look like it's failing to load on the second pass, though, since I
> get one response from the "warn" you suggested:
> 
>       # bin/httpd -X
>       util.pm: MSELproxy::util about to bootstrap MSELproxy::util ...
>       [Fri Oct  1 00:43:05 1999] null: ...saw SiteMap...
>       Syntax error on line 14 of /usr/local/apache/conf/perl.conf:
>       Invalid command 'SiteMap', perhaps mis-spelled or defined by a
>       module not included in the server configuration



... more evidence ...  output of 
# MOD_PERL_TRACE=all bin/httpd -X

perl_parse args: '/dev/null' ...allocating perl interpreter...ok
constructing perl interpreter...ok
ok
running perl interpreter...ok
mod_perl: 0 END blocks encountered during server startup
perl_cmd_require: conf/perl-startup.pl
attempting to require `conf/perl-startup.pl'
loading perl module 'Apache::Constants::Exports'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::util'...[Fri Oct  1 00:54:26 1999]
        util.pm: MSELproxy::util about to bootstrap MSELproxy::util ...
ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::AccessManager'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::OCLC'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::RLG'...ok
blessing cmd_parms=(0xbfffdb2c)
[Fri Oct  1 00:54:26 1999] null: ...saw SiteMap...              <---
[root@pembroke apache]# loading perl module 'Apache'...ok
perl_startup: perl aleady running...ok
loading perl module 'Apache'...ok
cmd_cleanup: SvREFCNT($MSELproxy::util::$obj) == 1
cmd_cleanup: SvREFCNT($MSELproxy::util::$obj) == 1
loading perl module 'Apache'...ok
perl_cmd_require: conf/perl-startup.pl
attempting to require `conf/perl-startup.pl'
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::util'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::AccessManager'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::OCLC'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::RLG'...ok
Syntax error on line 14 of /usr/local/apache/conf/perl.conf:
Invalid command 'SiteMap', perhaps mis-spelled or defined by a module not
included in the server configuration

#######################################################################

This is a stuff to be integrated into the DB section, all mostly by jwb:

Date: Thu, 14 Oct 1999 17:21:18 -0700
From: Jeffrey Baker <jwb@cp.net>
To: modperl@apache.org, dbi-users@isc.org
Cc: sbekman@iil.intel.com
Subject: More on web application performance with DBI

Hi all,

I have converted my text-only guide to web application performance
using mod_perl and DBI into HTML.  The guide now lives alongside my
DBI examples page at http://www.saturn5.com/~jwb/dbi-performance.html
.

I have also conducted a silly benchmark to see how all of these
optimization affect performance.  Please remember that it is dangerous
to extrapolate the results of a benchmark, especially one as
rudimentary as this.  With that said please consider the following
data.

Environment:
DB Server: Oracle 8.0.6, Sun Ultra2, 2 CPUs, 2GB RAM, Sun A1000 disks
App Server: Linux, PII 350, 128MB RAM, Apache 1.3.6, mod_perl 1.19
Benchmark Client: ApacheBench on same machine as application server

Each benchmark consisted of a single request selecting one row from
the database with a randomly selected primary key.  The benchmark was
run through 1000 requests with 10 simultaneous clients.  The results
were recorded using each level of optimization from my tutorial.

Zero optimization: 41.67 requests/second
Stage 1 (persistent connections): 140.17 requests/second
Stage 2 (bound parameters): 139.20 requests/second
Stage 3 (persistent statement handles): 251.13 requests/second

It is interesting that the Stage 2 optimization didn't gain anything
over Stage 1.  I think this is because of the relative simplicity of
my query, the small size of the test database (1000 rows), and the
lack of other clients connecting to the database at the same time.  In
a real application, the cache thrashing that is caused by dynamic SQL
statements would probably be detrimental to performance.  In any case
Stage 2 paves the way for Stage 3, which certainly does increase the
request rate!

So, check it out at http://www.saturn5.com/~jwb/dbi-performance.html

Date: Wed, 23 Feb 2000 23:18:23 -0800
From: Jeffrey W. Baker <jwbaker@acm.org>
To: modperl@apache.org
Subject: Re: Database connection pooling... (beyond Apache::DBI)

Greg Stark wrote:
> 
> Sean Chittenden <sean@serverninjas.com> writes:
> 
> >       Howdy.  We're all probably pretty familiar with Apache::DBI and
> > the fact that it opens a database connection per apache process.  Sounds
> > groovy and works well with only one or two servers.  Everything is gravy
> > until you get a cluster of servers, ie 20-30 machines, each with 300+
> > processes.
> 
> 300+ perl processes per machine? No way. The only way that would make _any_
> sense is if your perl code is extremely i/o dependent and your perl code is
> extremely light. Even then you're way better off having the i/o operations
> queued quickly and processed asynchronously.

This conversation happens on an approximately biweekly schedule,
either on modperl or dbi-users, or some other list I have the
misfortune of frequenting.  Please allow me to expand upon this
subject a bit.

I have not yet gotten a satisfactory answer from anyone who starts
these threads regarding why they want connection pooling.  I suspect
that people think it is needed because everyone else (Netscape,
Microsoft, Bea) is doing it.  There is a particular kind of
application where pooled connections are useful, and there are
particular situations where it is a waste.  Every project I have ever
done falls into the latter category, and I can only think of a few
cases that fall under the former.

Connection pooling is a system where your application server threads
or processes, which number n on a single machine, share a pool of
database connections which number fewer than n.  This is done to
minimize the number of database connections which are open at once,
which in turn is supposed to reduce the load on the database server.
This is effective when database activity is a small fraction of the
total load on the application server.  For example, if your
application server mostly performs matrix manipulation, and only
occassionally hits the database, it would make sense for it to
relinquish the connection when it is not in use.

The downside to connection pooling is that it imposes some overhead.
The connections must be managed properly, and the scheme should be
transparent to the programmer whose code is using it.  So when a piece
of code requests a database connection, the connection manager needs
to decide which one to return.  It may have to wait for one to free
up, or it may have to open one based on some low-water-mark hueristic.
It may also need to decide that a connection consumer has died or gone
away, possibly taking the connection with it.  So you can see that
opening a pooled connection is more computationally expensive than
opening a dedicated connection.

This pooling overhead is a total waste of time when the majority of
what your application is doing is database-related.  If your program
will issue 100 queries and performa transaction during the course of
fulfilling a request, pooled connections will not make sense.  The
reason is that Apache already provides a mechanism for killing off
database connections in this scenario.  If a process or thread is
sitting about idle, Apache will come along an terminate it, freeing
the database connection in the process.  For database-bound or
transactional programs, the one-to-one mapping of processes to
database connections is ideal.

Pooling is also less attractive because modern databases can handle
many connections.  Oracle with MTS will run fine with just as many
connections as you care to open.  The application designer should
study how many connections he realistically plans to open.  If your
application is bound by database performance, it makes sense to cap
the number of clients, so you would not allow you applications to open
too many connections.  If your application is transactional, you don't
have any choice but to give each processes its own dedicated
connection.  If your application is compute-bound, then your database
is lightly loaded and you won't mind opening a lot of connections.

The summary is that if your application is database-bound, or is
processing transactions, you don't need or even want connection
pooling.



###################################

=> Security

It's a good idea to protect your various monitors like perl-status and
alike by password. The less information you provide for intruders, the
harder their break in task would be!!! (One of the biggest helps you
can provide for these bad guys is showing them all the scripts you use
if some of them are in public domain, while they can find out most of
them by browsing your site. The moment they know the name of the
script, they can grab the source of the script from the web (where the
script has come from) and learn the source and probably find a few or
even many security breaches. Security but obscurity doesn't really
works against a determined intruder but it definitely helps to wave
away some of the less determined malicious fellas.

e.g:

<Location /sys-monitor>
  SetHandler perl-script
  PerlHandler Apache::VMonitor
  AuthUserFile /home/httpd/perl/.htpasswd
  AuthGroupFile /dev/null
  AuthName "SH Admin"
  AuthType Basic
  <Limit GET POST>
    require user foo bar
  </Limit>
</Location>

And the passwd file:
  /home/httpd/perl/.htpasswd:
  foo:1SA3h/d27mCp
  bar:WbWQhZM3m4kl

###################################


> THere's nothing wrong with Ralf's guide per se, but I think
> you should mention in your Adding a proxy server section that
> mod_rewrite might be necessary if dynamic content is intermixed
> with static content.
  
That sounds reasonable indeed. I'll add it. Don't understand me wrong -   
I'm not against adding more things, I'm again duplication, which creates a
mess. So once you made it clear, we need that - I'll certainly add it.

Would you add something about using mod_rewrite to handle my scenario
to the guide?

Perhaps what you're looking for resembles this:

RewriteRule ^/(images|static)/ - [S=1]
RewriteRule (.+) http://backend$1 [P,L]

John D Groenveld wrote:
> 
> I've been using mod_proxy
> to proxypass my static content away from my /modperl
> directories. Now, I'd like to make my root
> dynamic and thus pass everything except /images and
> /static.
> I've looked at the guide and tuning  docs, as well
> as the mod_proxy docs, but I must be missing
> something.


###################################

Just a snippet to try...

try this (in the mod_perl-x.xx directory):

% make start_httpd
% strace -o strace.out -p `cat t/logs/httpd.pid` &
% make run_tests
% grep open stace.out | grep .htaccess > send_to_modperl_list
% make kill_httpd

and send us that file.  I have the feeling there's a .htaccess in your
tree that the process can't read.

###################################

Apache::RegistryNG is just waiting for more people to bang on it.  so, if
you make your module a sub-class of Apache::RegistryNG, that will help
things move forward a bit :)

###################################

At the strategy sections put (first work on it):

=head1 REDUCING THE NUMBER OF LARGE PROCESSES

Unfortunately, simply reducing the size of each HTTPD process is not
enough on a very busy site.  You also need to reduce the quantity of
these processes.  This reduces memory consumption even more, and
results in fewer processes fighting for the attention of the CPU.  If
you can reduce the quantity of processes to fit into RAM, your
response time is increased even more.

The idea of the techniques outlined below is to offload the normal   
document delivery (such as static HTML and GIF files) from the
mod_perl HTTPD, and let it only handle the mod_perl requests.  This 
way, your large mod_perl HTTPD processes are not tied up delivering
simple content when a smaller process could perform the same job more
efficiently.

In the techniques below where there are two HTTPD configurations, the 
same httpd executable can be used for both configurations; there is no
need to build HTTPD both with and without mod_perl compiled into it.
With Apache 1.3 this can be done with the DSO configuration -- just  
configure one httpd invocation to dynamically load mod_perl and the 
other not to do so.  
 
These approaches work best when most of the requests are for static
content rather than mod_perl programs.  Log file analysis become a bit
of a challenge when you have multiple servers running on the same
host, since you must log to different files.

=head2 TWO MACHINES

The simplest way is to put all static content on one machine, and all
mod_perl programs on another.  The only trick is to make sure all
links are properly coded to refer to the proper host.  The static
content will be served up by lots of small HTTPD processes (configured
I<not> to use mod_perl), and the relatively few mod_perl requests
can be handled by the smaller number of large HTTPD processes on the
other machine.

The drawback is that you must maintain two machines, and this can get
expensive.  For extremely large projects, this is the best way to go.

=head2 TWO IP ADDRESSES

Similar to above, but one HTTPD runs bound to one IP address, while
the other runs bound to another IP address.  The only difference is 
that one machine runs both servers.  Total memory usage is reduced 
because the majority of files are served by the smaller HTTPD
processes, so there are fewer large mod_perl HTTPD processes sitting
around.

This is accomplished using the F<httpd.conf> directive C<BindAddress> 
to make each HTTPD respond only to one IP address on this host.  One
will have mod_perl enabled, and the other will not.

=head2 USING ProxyPass WITH TWO SERVERS

To overcome the limitation of the alternate port above, you can use
dual Apache HTTPD servers with just slight difference in
configuration.  Essentially, you set up two servers just as you would
with the two port on same IP address method above.  However, in your
primary HTTPD configuration you add a line like this:

 ProxyPass /programs http://localhost:8042/programs

Where your mod_perl enabled HTTPD is running on port 8042, and has
only the directory F<programs> within its DocumentRoot.  This assumes 
that you have included the mod_proxy module in your server when it was
built.

Now, when you access http://www.domain.com/programs/printenv it will
internally be passed through to your HTTPD running on port 8042 as the
URL http://localhost:8042/programs/printenv and the result relayed
back transparently.  To the client, it all seems as if it is just one
server running.  This can also be used on the dual-host version to
hide the second server from view if desired.

=begin html
<P>
A complete configuration example of this technique is provided by
two HTTPD configuration files.
<A HREF="httpd.conf.txt">httpd.conf</A> is for the main server for all
regular pages, and <A HREF="httpd%2bperl.conf.txt">httpd+perl.conf</A> is
for the mod_perl programs accessed in the <CODE>/programs</CODE> URL. 


The directory structure assumes that F</var/www/documents> is the
C<DocumentRoot> directory, and the the mod_perl programs are in
F</var/www/programs> and F</var/www/rprograms>.  I start them as
follows:

 daemon httpd
 daemon httpd -f conf/httpd+perl.conf

=head2 SQUID ACCELERATOR

Another approach to reducing the number of large HTTPD processes on
one machine is to use an accelerator such as Squid (which can be found
at http://squid.nlanr.net/Squid/ on the web) between the clients and
your large mod_perl HTTPD processes.  The idea here is that squid will
handle the static objects from its cache while the HTTPD processes 
will handle mostly just the mod_perl requests once the cache is
primed.  This reduces the number of HTTPD processes and thus reduces
the amount of memory used.

To set this up, just install the current version of Squid (at this  
writing, this is version 1.1.22) and use the RunAccel script to start
it.  You will need to reconfigure your HTTPD to use an alternate port,
such as 8042, rather than its default port 80.  To do this, you can
either change the F<httpd.conf> line C<Port> or add a C<Listen>   
directive to match the port specified in the F<squid.conf> file.  
Your URLs do not need to change.  The benefit of using the C<Listen>  
directive is that redirected URLs will still use the default port 80
rather than your alternate port, which might reveal your real server 
location to the outside world and bypass the accelerator.

In the F<squid.conf> file, you will probably want to add C<programs>
and C<perl> to the C<cache_stoplist> parameter so that these are
always passed through to the HTTPD server under the assumption that
they always produce different results.

This is very similar to the two port, ProxyPass version above, but the
Squid cache may be more flexible to fine tune for dynamic documents
that do not change on every view.  The Squid proxy server also seems
to be more stable and robust than the Apache 1.2.4 proxy module.

One drawback to using this accelerator is that the logfiles will   
always report access from IP address 127.0.0.1, which is the local
host loopback address.  Also, any access permissions or other user
tracking that requires the remote IP address will always see the local
address.  The following code uses a feature of recent mod_perl
versions (tested with mod_perl 1.16 and Apache 1.3.3) to trick Apache
into logging the real client address and giving that information to
mod_perl programs for their purposes.


First, in your F<startup.perl> file add the following code:

 use Apache::Constants qw(OK);

 sub My::SquidRemoteAddr ($) {
   my $r = shift;

   if (my ($ip) = $r->header_in('X-Forwarded-For') =~ /([^,\s]+)$/) {
     $r->connection->remote_ip($ip);
   }

   return OK;
 }

Next, add this to your F<httpd.conf> file:

 PerlPostReadRequestHandler My::SquidRemoteAddr

This will cause every request to have its C<remote_ip> address
overridden by the value set in the C<X-Forwarded-For> header added by
Squid.  Note that if you have multiple proxies between the client and
the server, you want the IP address of the last machine before your
accelerator.  This will be the right-most address in the
X-Forwarded-For header (assuming the other proxies append their
addresses to this same header, like Squid does.)
   
If you use apache with mod_proxy at your frontend, you can use Ask
Bjrn Hansen's mod_proxy_add_forward module from
ftp://ftp.netcetera.dk/pub/apache/ to make it insert the
C<X-Forwarded-For> header.
  
###################################

config.pod:

use Eric's presentation:

http://conferences.oreilly.com/cd/apache/presentations/echolet/contents.html

###################################

mod_perl Humour.

* mod_perl for embedded devices:

Q: mod_perl for my Palm Pilot dumps core when built as a DSO, and
the Palm lacks the memory to build statically, what should I do?

A: you should get another Palm Pilot to act as a reverse proxy

by Eric Cholet.

#################################################

DBI tips to improve performance:

Need to work on the snippets below:


What if the user_id has something that needs to be quoted?  I speak
of the general case.  User data should not get anywhere *near* an SQL
line... it should always be inserted via placeholders or very very
careful consideration to quoting.


Ahh, I see. I basically do the latter, with $dbh->quote. The contents of
$Session are entirely system-generated. The user gives a ticket through
the URL, yes, but that is parsed and validated and checked for presence in
the DB before you even get to code that works like I had described.

I agree - but you should always be aware of the issues with using
placeholders for the database engine that you use. Sybase in
particular has a deficient implementation, which tends to run out of
space and creates locking contention. Using stored procs instead is a
lot better (although it doesn't solve the quoting problems).

OTOH, Oracle caches compiled SQL, and using placeholders means it's not
caching SQL with specific data in it. The values can get bound into the
compiled SQL just as easily, and it speeds things up by a noticable amount
(factor of ~3 in my tests)

If we are on this topic, I have a few questions. I've just read the DBI
manpage, there is a prepare_cached() call. It's useless in mod_cgi if used
only once with the same params across the script. If I use Apache::DBI,
and replace all prepare statements (which include placeholders) with
prepare_cached(). Does it mean that like with modules preloading , the
prepare will be called only once per unique statements thru the whole life
of the child?

Otherwise a usage of placeholders is useless, if you do only one execute() 
call per unique prepare() statement. The only benefit is of DBI taking
handle of quoting the values for you. 

I don't remember someone mentioned prepare_cached() ever. What's the
verdict?



Simply adding the "_cached" to "prepare()" in one of my utilities
increased the performance eight fold (Oracle non-mod_perl environment).

I don't know the fine points of if it is possible to share cached
prepares across children (can you even fork with db connections?), but
if your code is doing the same query(ies) over and over, definitly give
it a try. 

Not necessarily; it depends on your database. Oracle does caching which
persists until it needs the space for something else; if you're finding
information about customers, it's much more efficinet for there to be one
entry in the library cache like this:

        select * from customers where customer_id = :p1

than it is for there to be lots of them like:

        select * from customers where customer_id = 123
        select * from customers where customer_id = 465
        select * from customers where customer_id = 789

since Oracle has to parse, compile and cache each one separatley. 

I don't know if other databases do this kind of caching. 

Ok, this makes sense. I just read the MySQL manual - with all grief, it
doesn't cache :(

So, I still think to use prepare_cached() to cache on the DBI behalf, but
it's said to work thru the life of $dbh and since my $dbh is my() lexicall
variable, I don't understand whether I get this benefit or not? I know
that Apache::DBI maintains a pool of connections, does it preserver the
cache of prepare statements as well (I mean does it preserve the whole 
$dbh object )? If it does, I get a speedup at least a speedup for the
whole life of a single connection. I think that the speedup is even better
than the one you have been talking about, since if Oracle caches the
prepare statement, DBI still reachs out for Oracle, if it's local cache we
get a little more save ups. 

Anyone deployes the scenario I have tried to present here? Seems like a
good candidate for a performance chapter of the guide if it really
makes speed better...

The statement cursors will be cached per $dbh, which Apache::DBI
caches, so there is an extreme performance boost... as your 
application runs caching all its cursors, database queries will 
become execution speed, no query parsing will be involved anymore.

On Oracle, the performance improvement I saw was 100% by using
prepare_cached functionality.

If you have just a small number of web servers, the caching difference
between Oracle & MySQL will be small on the db end.  Its when you have
a lot of DBI handles that things might get inefficient.  But I'm 
sure you are running a proxy front end, right Stas? :)

Be warned: there are some pitfalls associated with prepare_cached().
It actually gives you a reference to the *same* cached statement
handle, not just a similar copy.  So you can't do this:

my $sth1 = $dbh->prepare_cached('select name from table where id=?');
my $sth2 = $dbh->prepare_cached('select name from table where id=?');

$sth1 & $sth2 are now the same object!  If you try to use them
independently, they'll stomp all over each other.

That said, prepare_cached() can be a huge win when using a slow
database like Oracle.  For mysql, it doesn't seem to help much, since
mysql is so darn fast at preparing its statements.

Sometimes you have to be careful about that, yes.  For instance, I was
repeatedly executing a statement to insert data into a varchar column.  The
first value to insert just happened to be a number, so DBD::mysql thought that
it was a numeric column, and subsequent insertions failed using that same
statement handle.

I'm not sure what the correct solution should have been in that case, but I
reverted back to calling $dbh->quote($val) and putting it directly into the
SQL.  My opinion is that mysql should do a better job of figuring out which
fields are actually numeric and which are strings - i.e. get the info from the
database, not from the format of the data I'm passing it.



Actually, I'm a big fan of placeholders.  I think they make the
programming task a lot easier, since you don't have to worry about
quoting data values.  They can also be quite nice when you've got
values in a nice data structure and you want to pass them all to the
database - just put them in the bound-vars list, and forget about
constructing some big SQL string.

I believe mysql just emulates true placeholders by doing the quoting,
etc. behind the scenes.  So it's probably not much faster to use
placeholders than direct embedded values.  But I think placeholders
are cleaner, generally, and more fun.

In my experience, prepare_cached() is just a judgment call.  It hasn't
seemed to be a big performance win for mysql, so sometimes I use it,
sometimes I don't.  I always use it with Oracle, though.

prepare_cached is implemented by the database handle (and really the
database itself).  For example, in Oracle it speeds things up.  In MySQL,
it is exactly the same as prepare() because DBD::mysql does not implement
it because MySQL itself has no mechanism for doing this.

As I said in a previous message, prepare_cached() don't cache anything
under MySQL.  However, you can implement your own statement handle caching
scheme pretty easily by either subclassing DBI or writing a DB
access module of your own (my preferred method).

my $db = MyDB->new;

my $sql = 'SELECT 1';
my $sth = $db->get_sth($sql);

$sth->execute or die $dbh->errstr;
my ($numone) = $sth->fetchrow_array;
$sth->finish or die $dbh->errstr;  # This is doubly necessary with this
caching scheme!

sub get_sth
{
    my $self = shift;
    my $sql = shift;

    return $self->{sth_cache}->{$sql} if exists $self->{sth_cache}->{$sql};

    $self->{sth_cache}->{$sql} = $self->{dbh}->prepare($sql) 
	or die $self->{dbh}->errstr;

    return $self->{sth_cache}->{$sql};
}

I've used that in a few situations and it appears to speed things up a
bit.

For mod_perl, we would probably want to make $self->{sth_cache} global.


You know, I just benchmarked this on a machine running PostgreSQL and it
didn't actually speed things up (or slow it down).  However, I suspect
that under mod_perl if this were something that were globally shared
inside a child process it might make a difference.  Plus it also depends
on the database used.

(Contributors: Randal L. Schwartz, Steve Willer, Michael Peppler, Mark
Cogan, Eric Hammond, Russell D. Weiss, Joshua Chamas, Ken Williams, Peter Grimes)

#################################################

As a quick side note, I actually found that it's faster to write the logs
directly into a .gz, and read them out of the .gz, through pipes.  It takes
longer (significantly, by my experience) to read 100 megs from the drive
than it does to compress or uncompress 5 megs of data.

#################################################

performance.pod - extend on Apache::TimeIt package

#################################################

Add a new section - contributing to the guide - with incentives and
guidelines of contributions (diff against pod...)

#################################################

security.pod : add Apache:Auth* modules

#################################################

examples of Apache::Session::DBI code:

use strict;
use DBI;
use Apache::Session::DBI;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

# Recommendation from mod_perl_traps:
use Carp ();
local $SIG{__WARN__} = \&Carp::cluck;

[...]

# Initiate a session ID
my $session = ();
my $opts = {  autocommit => 0, 
              lifetime   => 3600 };     # 3600 is one hour

# Read in the cookie if this is an old session
my $r = Apache->request;
my $no_cookie = '';
my $cookie = $r->header_in('Cookie');
{
    # eliminate logging from Apache::Session::DBI's use of `warn'
    local $^W = 0;      

    if (defined($cookie) && $cookie ne '') {
        
        $cookie =~ s/SESSION_ID=(\w*)/$1/;
        $session = Apache::Session::DBI->open($cookie, $opts);
        $no_cookie = 'Y' unless defined($session);
    }

    # Could have been obsolete - get a new one
    $session = Apache::Session::DBI->new($opts) unless defined($session);

}

# Might be a new session, so let's give them a cookie back
if (! defined($cookie) || $no_cookie) {
    local $^W = 0;

    my $session_cookie = "SESSION_ID=$session->{'_ID'}";
    $r->header_out("Set-Cookie" => $session_cookie);
}


