A Web proxy server is a
specialized HTTP server.
The primary use of a proxy server is to allow internal clients
access to the Internet from behind a firewall. Anyone behind a
firewall can now have full Web access past the firewall host
with minimum effort and without compromising security.
The proxy server listens for requests from
clients within the firewall and forwards
these requests to remote internet servers outside the firewall.
The proxy server reads responses from the external servers and
then sends them to internal client clients.
In the usual case, all the clients within a
given subnet use the same proxy server. This
makes it possible for the proxy to cache documents efficiently
that are requested by a number of clients.
People using a proxy server should feel as
if they are getting responses directly from remote
servers.
Clients without Domain Name Services
(DNS) can still use the Web. The proxy
IP address is the only
information they need. Organizations using private network
address spaces such as the class A net 10.*.*.* can still
use the Internet as long as the proxy is visible to both
the private internal net and the Internet.
Most proxy servers are
implemented on a per-access method basis. Proxy servers
can allow or deny internet requests according to the
protocol of the requests. For instance a proxy server
can allow calls to FTP servers while denying calls to
HTTP servers.
You can use a proxy server in a number of ways, including:
Some machines
on your local network might not be able to directly access Internet
resources. For instance, some browsers might not be able to directly
access Internet resources because they run on systems behind a
protective firewall. In these cases, a proxy server can retrieve
the desired files for them.
In
The proxy server receives the request
from the browser in the form of a URL. The proxy server
retrieves the requested information, converts it to
HTML format and
sends it on to the browser behind the firewall. The
proxy server can handle all network requests if it is
the only machine directly connected to the Internet.
Figure 1 Proxy Server Running on a Firewall
Browser Access to the Internet

Usually, the clients within a subnet
access the same Web proxy server. Some proxy servers
let you cache internet documents for clients within
the local area network. Caching documents means keeping a local copy of internet
documents, so that the server doesnt need to
request them over and over again.
Caching is more effective on the
proxy server than on each client system. This saves
disk space because only a single copy is cached.
Caching on the proxy server means more documents that
are often referenced by multiple browsers can be
cached more efficiently. The system administrator can
predict which documents are worth caching for a long
time and which are not.
It is easy to configure an entire
workgroup to use the proxy servers cache of
documents. This reduces the load on the server by
allowing it to get information from the cache when
responding to subsequent client requests for the
same data.
Caching also makes it possible to browse
the Web even if a Web server, or even the external network,
is down, as long as one can connect to the proxy server.
This improves service to remote network resources, such
as busy FTP sites and transient Gopher servers that are
often unavailable remotely, but may be cached locally.
You can also cache a presentation you
plan to present elsewhere when you are unsure of the
locations Internet capabilities.
When using a proxy server it is possible to
filter client transactions at the protocol level. The proxy
can control access to services for individual methods, hosts,
and domains. Some proxy servers let you:
For a browser to use a proxy server they must
channel their internet requests through the proxy server. Most
browsers allow you to configure them so that they direct their
requests through a proxy server. Depending on the browser, you
can identify a proxy server by identifying the servers
domain name or IP address. However, unless you configure the
browsers individually on your subnet to look for the proxy server,
they wont send their requests to it.
Organizations that use one or more private
network address spaces, such as class A
10.*.*.*, can still use the Internet. To access the Internet
they need to have a proxy server that is visible to the Internet
and to the private internal network(s).
Many clients have their own IP address and
a direct connection to
servers on the Internet. When a normal HTTP request is made by
the browser, the HTTP server gets only the path and keyword
portion of the requested URL. Other parts of the URL, such as
the protocol specifier "http:" and the host name,
are clear to the remote HTTP server. The remote server knows
that it is an HTTP server, and it knows the host machine that
on which it is running (see
Figure 2). The requested path specifies the document or a CGI
program on the local filesystem of the server, or some other
resource available from that server.
When a user enters:
http://mycompany.com/information/ProxyDetails.html
The browser converts it to:
GET /information/ProxyDetails.html
The browser connects to the server
running on mycompany.com and issues the command and
waits for a response. In this example, the browser makes
a request to the HTTP server and specifies the requested
resource relative to that server; there is no protocol nor
host name specifier in the URL.
Figure 2 A Normal Web Transaction
The request specified
the path (Data Directory) of information and the ProxyDetails.html
document located in the Data Directory. The response is a
document or an error message.
The user in this example could just
as easily use FTP:// ----- in which case the client sends
the request to the specified FTP server.
The proxy server
acts as both a server system and a client system. It is a server
when accepting HTTP requests from browsers, and acts as a
client system when its browser software connects to remote
servers to retrieve documents.
The proxy server uses
the header fields passed to it by the browser
without modification when it connects to the remote server.
This means the browser does not lose any functionality when
going through a proxy.
A complete proxy server
should be able to communicate all the Web protocols, the most
important ones being HTTP, FTP, Gopher, and WAIS. Proxies that
handle only a single Internet protocol, such as HTTP, are
possible, but a Web browser would then require access to
other proxy servers to handle the remaining protocols.
When a browser sends a request through a
proxy server, the
browser always uses HTTP for
the transactions with the proxy server. This is true even when
the user wants to access a remote server that uses another
protocol; for example, FTP.
Instead of specifying only the pathname and
search keywords to the proxy server, the browser specifies
the full URL. This way the proxy server has all the information
necessary to make the actual request to the remote server
specified in the request URL, using the protocol specified
in the URL.
The only difference between a normal
and proxied HTTP transaction is that HTTP transactions
routed through a proxy server require a full URL.
When you use a proxy server as a client
system, it acts as a browser to receive documents. The
following is a typical example of a proxied HTTP request:
When you enter a full URL, for example:
The browser converts the URL to:
The browser then connects to the server, and
then the proxy server provides the connection to the Internet.
The proxy server converts this request
to:
The proxy server connects to the server
running on mycompany.com. The server then issues the
command and waits for a response, returns the response to the
proxy server, which then returns
the response to the client.
Figure 3
shows a browser making a request to the proxy server using
HTTP and specifying a full URL.
The figure shows that the URL passed between the proxy server
and the remote server specifies neither the remote host
name nor the HTTP protocol.
Figure 3 An HTTP Transaction via a Proxy Server
Figure 4
shows a browser request via a proxy server using HTTP even though the request
specifies a document on an FTP server on the Internet. The proxy server sees
from the full URL that it should make an FTP connection. The proxy server makes
the connection and retrieves the file from the remote FTP server and sends it to
the browser using HTTP. In this case, the proxy server returns an FTP directory
listing as an HTML document
Figure 4 An FTP Transaction via a Proxy Server
Configuring Browsers
to Use the Proxy Server
Providing Internet
Access for Companies Using Private Networks
An Ordinary Web
Transaction Via a Server
Communication Via a Proxy Server
HTTP Browser Request
to Remote HTTP Transaction
http://mycompany.com/information/ProxyDetails.html
GET http://mycompany.com/information/ProxyDetails.html
GET /information/ProxyDetails.html
HTTP Browser Request to Remote FTP Transaction

Caching documents means storing documents locally so users do not
have to connect to a remote server to get files. When a local browser
requests a file, the server checks its cache to see if it has the
document. If the file exists in the cache, the server serves the
local copy to the browser. If you cache documents you need to decide:
Figure 5
shows a proxy server caching
a document retrieved from a remote server. The client (or other
clients) can request and receive this locally stored document at
a later time. Figure 5 Caching Documents on a Proxy Server
If an up-to-date version of the requested
document is found in the cache of the proxy server no connection
to the remote server is necessary as shown in
Figure 6 Retrieving Cached Documents
Caching documents can save users
considerable time when they request documents normally
located out on the Internet. A proxy server can serve
these documents much more quickly than remote servers.
In addition, caching a document that many users need can
save considerable network cost and connection time. Caching
can also reduce the amount of disk space browsers use because
many local browsers can use a single copy of a cached document.
Caching is disk based; when you restart
the server, documents that you cache are still available.
If you want, you can also configure the proxy server to use
only the local cache. For instance, you can provide Internet
documents to local browsers that do not have an internet
connection.
Many documents available on the
Internet are "living" documents.
Determining when documents should be updated or deleted can be
a difficult task. Some documents can remain stable for a very
long time and then suddenly change. Other documents can change
on a weekly or a daily basis. This means you need to decide
carefully how often to refresh or delete
the documents held in cache.
Chaining proxy servers
lets you run a proxy server as a local cache on behalf of a
department within an organization. The
individual departments have control over the server and cache.
These departmental proxy servers can connect to a proxy server
on a firewall between the Internet and the organization. This proxy server talks to the Internet as
shown in Figure 7.
Any restrictions for access set for the organization
proxy server take precedence over access restrictions
set for the departmental proxy servers.
For example, departmental proxy server 1 might
be set to allow all URL requests. The
organizational proxy server, as corporate policy, might be
set to deny all URL requests for certain online publications.
A request for one of these publications coming into proxy server
1 would be forwarded to the organizational proxy server. The
organizational proxy server would then deny the request.
Figure 7 Proxy Linking
Advantages of Caching on a Proxy Server
Managing Cached Documents
Proxy Server-to-Proxy
Server Linking

Conversely, proxy server 1 could be configured to deny URLs going to a designated FTP site while proxy server 2 and 3 and the organizational server are all allowed access to the site.