YALS.NET - Man page for perlfaq9 (1)

Quick ?s
Cheat Sheets
Man Pages
The Lynx
Software
PERLFAQ9(1)	       Perl Programmers Reference Guide 	   PERLFAQ9(1)



NAME
       perlfaq9 - Networking ($Revision: 1.28 $, $Date: 2005/12/31 00:54:37 $)

DESCRIPTION
       This section deals with questions related to networking, the internet,
       and a few on the web.

       What is the correct form of response from a CGI script?

       (Alan Flavell  answers...)

       The Common Gateway Interface (CGI) specifies a software interface
       between a program ("CGI script") and a web server (HTTPD). It is not
       specific to Perl, and has its own FAQs and tutorials, and usenet group,
       comp.infosystems.www.authoring.cgi

       The CGI specification is outlined in an informational RFC:
       http://www.ietf.org/rfc/rfc3875

       Other relevant documentation listed in:
       http://www.perl.org/CGI_MetaFAQ.html

       These Perl FAQs very selectively cover some CGI issues. However, Perl
       programmers are strongly advised to use the CGI.pm module, to take care
       of the details for them.

       The similarity between CGI response headers (defined in the CGI speci
       fication) and HTTP response headers (defined in the HTTP specification,
       RFC2616) is intentional, but can sometimes be confusing.

       The CGI specification defines two kinds of script: the "Parsed Header"
       script, and the "Non Parsed Header" (NPH) script. Check your server
       documentation to see what it supports. "Parsed Header" scripts are sim
       pler in various respects. The CGI specification allows any of the usual
       newline representations in the CGI response (its the servers job to
       create an accurate HTTP response based on it). So "\n" written in text
       mode is technically correct, and recommended. NPH scripts are more
       tricky: they must put out a complete and accurate set of HTTP transac
       tion response headers; the HTTP specification calls for records to be
       terminated with carriage-return and line-feed, i.e ASCII \015\012 writ
       ten in binary mode.

       Using CGI.pm gives excellent platform independence, including EBCDIC
       systems. CGI.pm selects an appropriate newline representation
       ($CGI::CRLF) and sets binmode as appropriate.

       My CGI script runs from the command line but not the browser.  (500
       Server Error)

       Several things could be wrong.  You can go through the "Troubleshooting
       Perl CGI scripts" guide at

	       http://www.perl.org/troubleshooting_CGI.html

       If, after that, you can demonstrate that youve read the FAQs and that
       your problem isnt something simple that can be easily answered, youll
       probably receive a courteous and useful reply to your question if you
       post it on comp.infosystems.www.authoring.cgi (if its something to do
       with HTTP or the CGI protocols).  Questions that appear to be Perl
       questions but are really CGI ones that are posted to
       comp.lang.perl.misc are not so well received.

       The useful FAQs, related documents, and troubleshooting guides are
       listed in the CGI Meta FAQ:

	       http://www.perl.org/CGI_MetaFAQ.html

       How can I get better error messages from a CGI program?

       Use the CGI::Carp module.  It replaces "warn" and "die", plus the nor
       mal Carp modules "carp", "croak", and "confess" functions with more
       verbose and safer versions.  It still sends them to the normal server
       error log.

	   use CGI::Carp;
	   warn "This is a complaint";
	   die "But this one is serious";

       The following use of CGI::Carp also redirects errors to a file of your
       choice, placed in a BEGIN block to catch compile-time warnings as well:

	   BEGIN {
	       use CGI::Carp qw(carpout);
	       open(LOG, ">>/var/local/cgi-logs/mycgi-log")
		   or die "Unable to append to mycgi-log: $!\n";
	       carpout(*LOG);
	   }

       You can even arrange for fatal errors to go back to the client browser,
       which is nice for your own debugging, but might confuse the end user.

	   use CGI::Carp qw(fatalsToBrowser);
	   die "Bad error here";

       Even if the error happens before you get the HTTP header out, the mod
       ule will try to take care of this to avoid the dreaded server 500
       errors.	Normal warnings still go out to the server error log (or wher
       ever youve sent them with "carpout") with the application name and
       date stamp prepended.

       How do I remove HTML from a string?

       The most correct way (albeit not the fastest) is to use HTML::Parser
       from CPAN.  Another mostly correct way is to use HTML::FormatText which
       not only removes HTML but also attempts to do a little simple format
       ting of the resulting plain text.

       Many folks attempt a simple-minded regular expression approach, like
       "s/<.*?>//g", but that fails in many cases because the tags may con
       tinue over line breaks, they may contain quoted angle-brackets, or HTML
       comment may be present.	Plus, folks forget to convert entities--like
       "<" for example.

       Heres one "simple-minded" approach, that works for most files:

	   #!/usr/bin/perl -p0777
	   s/<(?:[^>"]*|(["]).*?\1)*>//gs

       If you want a more complete solution, see the 3-stage striphtml program
       in http://www.cpan.org/authors/Tom_Christiansen/scripts/striphtml.gz .

       Here are some tricky cases that you should think about when picking a
       solution:

	   

	   

	   

	   

	   <# Just data #>

	   >>>>>>>>>>> ]]>

       If HTML comments include other tags, those solutions would also break
       on text like this:

	   

       How do I extract URLs?

       You can easily extract all sorts of URLs from HTML with "HTML::Sim
       pleLinkExtor" which handles anchors, images, objects, frames, and many
       other tags that can contain a URL.  If you need anything more complex,
       you can create your own subclass of "HTML::LinkExtor" or
       "HTML::Parser".	You might even use "HTML::SimpleLinkExtor" as an exam
       ple for something specifically suited to your needs.

       You can use URI::Find to extract URLs from an arbitrary text document.

       Less complete solutions involving regular expressions can save you a
       lot of processing time if you know that the input is simple.  One solu
       tion from Tom Christiansen runs 100 times faster than most module based
       approaches but only extracts URLs from anchors where the first
       attribute is HREF and there are no other attributes.

	       #!/usr/bin/perl -n00
	       # qxurl - tchrist@perl.com
	       print "$2\n" while m{
		   < \s*
		     A \s+ HREF \s* = \s* (["]) (.*?) \1
		   \s* >
	       }gsix;

       How do I download a file from the users machine?  How do I open a file
       on another machine?

       In this case, download means to use the file upload feature of HTML
       forms.  You allow the web surfer to specify a file to send to your web
       server.	To you it looks like a download, and to the user it looks like
       an upload.  No matter what you call it, you do it with whats known as
       multipart/form-data encoding.  The CGI.pm module (which comes with Perl
       as part of the Standard Library) supports this in the start_multi
       part_form() method, which isnt the same as the startform() method.

       See the section in the CGI.pm documentation on file uploads for code
       examples and details.

       How do I make a pop-up menu in HTML?

       Use the
Yals.net is © 1999-2009 Crescendo Communications
Sharing tech info on the web for more than a decade!
This page was generated Thu Apr 30 17:05:21 2009