Web Proxy Servers

What is a Web Proxy Server?

A Web proxy server is a specialized HTTP server. The primary use of a proxy server is to allow internal clients access to the Internet from behind a firewall. Anyone behind a firewall can now have full Web access past the firewall host with minimum effort and without compromising security.

The proxy server listens for requests from clients within the firewall and forwards these requests to remote internet servers outside the firewall. The proxy server reads responses from the external servers and then sends them to internal client clients.

In the usual case, all the clients within a given subnet use the same proxy server. This makes it possible for the proxy to cache documents efficiently that are requested by a number of clients.

People using a proxy server should feel as if they are getting responses directly from remote servers.

Clients without Domain Name Services (DNS) can still use the Web. The proxy IP address is the only information they need. Organizations using private network address spaces such as the class A net 10.*.*.* can still use the Internet as long as the proxy is visible to both the private internal net and the Internet.

Most proxy servers are implemented on a per-access method basis. Proxy servers can allow or deny internet requests according to the protocol of the requests. For instance a proxy server can allow calls to FTP servers while denying calls to HTTP servers.

When Web Proxy Servers are Useful

You can use a proxy server in a number of ways, including:

Browser Access to the Internet

Some machines on your local network might not be able to directly access Internet resources. For instance, some browsers might not be able to directly access Internet resources because they run on systems behind a protective firewall. In these cases, a proxy server can retrieve the desired files for them.

In Figure 1, the Proxy server is running on a firewall host and making connections to the outside world using firewall software. You could also run the proxy server on another internal host that has full internet access, or on a machine inside the firewall.

The proxy server receives the request from the browser in the form of a URL. The proxy server retrieves the requested information, converts it to HTML format and sends it on to the browser behind the firewall. The proxy server can handle all network requests if it is the only machine directly connected to the Internet.

Figure 1 Proxy Server Running on a Firewall


Undisplayed Graphic


Caching Documents

Usually, the clients within a subnet access the same Web proxy server. Some proxy servers let you cache internet documents for clients within the local area network. Caching documents means keeping a local copy of internet documents, so that the server doesn’t need to request them over and over again.

Caching is more effective on the proxy server than on each client system. This saves disk space because only a single copy is cached. Caching on the proxy server means more documents that are often referenced by multiple browsers can be cached more efficiently. The system administrator can predict which documents are worth caching for a long time and which are not.

It is easy to configure an entire workgroup to use the proxy server’s cache of documents. This reduces the load on the server by allowing it to get information from the cache when responding to subsequent client requests for the same data.

Caching also makes it possible to browse the Web even if a Web server, or even the external network, is down, as long as one can connect to the proxy server. This improves service to remote network resources, such as busy FTP sites and transient Gopher servers that are often unavailable remotely, but may be cached locally.

You can also cache a presentation you plan to present elsewhere when you are unsure of the location’s Internet capabilities.

Selectively Controlling Access to the Internet and Subnets

When using a proxy server it is possible to filter client transactions at the protocol level. The proxy can control access to services for individual methods, hosts, and domains. Some proxy servers let you:

Configuring Browsers to Use the Proxy Server

For a browser to use a proxy server they must channel their internet requests through the proxy server. Most browsers allow you to configure them so that they direct their requests through a proxy server. Depending on the browser, you can identify a proxy server by identifying the server’s domain name or IP address. However, unless you configure the browsers individually on your subnet to look for the proxy server, they won’t send their requests to it.

Providing Internet Access for Companies Using Private Networks

Organizations that use one or more private network address spaces, such as class A 10.*.*.*, can still use the Internet. To access the Internet they need to have a proxy server that is visible to the Internet and to the private internal network(s).

An Ordinary Web Transaction Via a Server

Many clients have their own IP address and a direct connection to servers on the Internet. When a normal HTTP request is made by the browser, the HTTP server gets only the path and keyword portion of the requested URL. Other parts of the URL, such as the protocol specifier "http:" and the host name, are clear to the remote HTTP server. The remote server knows that it is an HTTP server, and it knows the host machine that on which it is running (see Figure 2). The requested path specifies the document or a CGI program on the local filesystem of the server, or some other resource available from that server.

When a user enters:

http://mycompany.com/information/ProxyDetails.html

The browser converts it to:

GET /information/ProxyDetails.html

The browser connects to the server running on mycompany.com and issues the command and waits for a response. In this example, the browser makes a request to the HTTP server and specifies the requested resource relative to that server; there is no protocol nor host name specifier in the URL.

Figure 2 A Normal Web Transaction


Undisplayed Graphic


The request specified the path (Data Directory) of information and the ProxyDetails.html document located in the Data Directory. The response is a document or an error message.

The user in this example could just as easily use FTP:// ----- in which case the client sends the request to the specified FTP server.

Communication Via a Proxy Server

The proxy server acts as both a server system and a client system. It is a server when accepting HTTP requests from browsers, and acts as a client system when its browser software connects to remote servers to retrieve documents.

The proxy server uses the header fields passed to it by the browser without modification when it connects to the remote server. This means the browser does not lose any functionality when going through a proxy.

A complete proxy server should be able to communicate all the Web protocols, the most important ones being HTTP, FTP, Gopher, and WAIS. Proxies that handle only a single Internet protocol, such as HTTP, are possible, but a Web browser would then require access to other proxy servers to handle the remaining protocols.

When a browser sends a request through a proxy server, the browser always uses HTTP for the transactions with the proxy server. This is true even when the user wants to access a remote server that uses another protocol; for example, FTP.

Instead of specifying only the pathname and search keywords to the proxy server, the browser specifies the full URL. This way the proxy server has all the information necessary to make the actual request to the remote server specified in the request URL, using the protocol specified in the URL.

The only difference between a normal and proxied HTTP transaction is that HTTP transactions routed through a proxy server require a full URL.

HTTP Browser Request to Remote HTTP Transaction

When you use a proxy server as a client system, it acts as a browser to receive documents. The following is a typical example of a proxied HTTP request:

When you enter a full URL, for example:
http://mycompany.com/information/ProxyDetails.html

The browser converts the URL to:
GET http://mycompany.com/information/ProxyDetails.html

The browser then connects to the server, and then the proxy server provides the connection to the Internet.

The proxy server converts this request to:
GET /information/ProxyDetails.html

The proxy server connects to the server running on mycompany.com. The server then issues the command and waits for a response, returns the response to the proxy server, which then returns the response to the client.

Figure 3 shows a browser making a request to the proxy server using HTTP and specifying a full URL. The figure shows that the URL passed between the proxy server and the remote server specifies neither the remote host name nor the HTTP protocol.

Figure 3 An HTTP Transaction via a Proxy Server


Undisplayed Graphic


HTTP Browser Request to Remote FTP Transaction

Figure 4 shows a browser request via a proxy server using HTTP even though the request specifies a document on an FTP server on the Internet. The proxy server sees from the full URL that it should make an FTP connection. The proxy server makes the connection and retrieves the file from the remote FTP server and sends it to the browser using HTTP. In this case, the proxy server returns an FTP directory listing as an HTML document

Figure 4 An FTP Transaction via a Proxy Server


Undisplayed Graphic


Advantages and Disadvantages of Caching Documents

Caching documents means storing documents locally so users do not have to connect to a remote server to get files. When a local browser requests a file, the server checks its cache to see if it has the document. If the file exists in the cache, the server serves the local copy to the browser. If you cache documents you need to decide:

Figure 5 shows a proxy server caching a document retrieved from a remote server. The client (or other clients) can request and receive this locally stored document at a later time.

 

Figure 5 Caching Documents on a Proxy Server


Undisplayed Graphic


If an up-to-date version of the requested document is found in the cache of the proxy server no connection to the remote server is necessary as shown in Figure 6.

Figure 6 Retrieving Cached Documents


Undisplayed Graphic


 

Advantages of Caching on a Proxy Server

Caching documents can save users considerable time when they request documents normally located out on the Internet. A proxy server can serve these documents much more quickly than remote servers. In addition, caching a document that many users need can save considerable network cost and connection time. Caching can also reduce the amount of disk space browsers use because many local browsers can use a single copy of a cached document.

Caching is disk based; when you restart the server, documents that you cache are still available. If you want, you can also configure the proxy server to use only the local cache. For instance, you can provide Internet documents to local browsers that do not have an internet connection.

Managing Cached Documents

Many documents available on the Internet are "living" documents. Determining when documents should be updated or deleted can be a difficult task. Some documents can remain stable for a very long time and then suddenly change. Other documents can change on a weekly or a daily basis. This means you need to decide carefully how often to refresh or delete the documents held in cache.

Proxy Server-to-Proxy Server Linking

Chaining proxy servers lets you run a proxy server as a local cache on behalf of a department within an organization. The individual departments have control over the server and cache. These departmental proxy servers can connect to a proxy server on a firewall between the Internet and the organization. This proxy server talks to the Internet as shown in Figure 7.

Any restrictions for access set for the organization proxy server take precedence over access restrictions set for the departmental proxy servers.

For example, departmental proxy server 1 might be set to allow all URL requests. The organizational proxy server, as corporate policy, might be set to deny all URL requests for certain online publications. A request for one of these publications coming into proxy server 1 would be forwarded to the organizational proxy server. The organizational proxy server would then deny the request.

Figure 7 Proxy Linking


Undisplayed Graphic


Conversely, proxy server 1 could be configured to deny URLs going to a designated FTP site while proxy server 2 and 3 and the organizational server are all allowed access to the site.