The World Wide Web Security FAQ

8. Server Logs and Privacy

(Thanks to Bob Bagwill who contributed many of the Q&A's in this section)

Q50: What information do readers reveal that they might want to keep private?

Most servers log every access. The log usually includes the IP address and/or host name, the time of the download, the user's name (if known by user authentication or obtained by the identd protocol), the URL requested (including the values of any variables from a form submitted using the GET method), the status of the request, and the size of the data transmitted. Some browsers also provide the client the reader is using, the URL that the client came from, and the user's e-mail address. Servers can log this information as well, or make it available to CGI scripts. Most WWW clients are probably run from single-user machines, thus a download can be attributed to an individual. Revealing any of those datums could be potentially damaging to a reader.

For example, XYZ.com downloading financial reports on ABC.com could signal a corporate takeover. The accesses to a internal job posting reveals who might be interested in changing jobs. The time a cartoon was downloaded reveals that the reader is misusing company resources. A referral log entry might contain something like:

 file://prez.xyz.com/hotlists/stocks2sellshort.html -> http://www.xyz.com/

The pattern of accesses made by an individual can reveal how they intend to use the information. And the input to searches can be particularly revealing.

Another way Web usage can be revealed locally is via browser history, hotlists, and cache. If someone has access to the reader's machine, they can check the contents of those databases. An obvious example is shared machines in an open lab or public library.

Proxy servers used for access to Web services outside an organization's firewall are in a particularly sensitive position. A proxy server will log every access to the outside Web made by every member of the organization and track both the IP number of the host making the request and the requested URL. A carelessly managed proxy server can therefore represent a significant invasion of privacy.

Q51: Do I need to respect my readers' privacy?

Yes. One of the requirements of responsible net citizenship is respecting the privacy of others. Just as you don't forward or post private email without the author's consent, in general you shouldn't use or post Web usage statistics that can be attributed to an individual.

If you are a government site, you may be required by law to protect the privacy of your readers. For example, U.S. Federal agencies are not allowed to collect or publish many types of data about their clients.

In most U.S. states, it is illegal for libraries and video stores to sell or otherwise distribute records of the materials that patrons have checked out. While the courts have yet to apply the same legal standard to be applied to electronic information services, it is not unreasonable for users to have the same expectation of privacy on the Web. In other countries, for example Germany, the law explicitly forbids the disclosure of online access lists. If your site chooses to use the Web logs to populate your mailing lists or to resell to other businesses, make sure you clearly advertise that fact.

Q52: How do I avoid collecting too much information?

One of the requirements of your Web site may be to collect statistics on usage to provide data to the organization and to justify Web site resources. In general, collecting information about accesses by individuals is probably not warranted or even useful.

The easiest way to avoid collecting too much information is to use a server that allows you to tailor the output logs, so that you can throw away everything but the essentials. Another way is to regularly summarize and discard the raw logs. Since the logs of popular sites tend to grow quickly, you probably will need to do that anyway.

Q53: How do I protect my readers' privacy?

There are two classes of readers: outsiders reading your documents, and insiders reading your documents and outside documents.

You can protect outsiders by summarizing your logs. You can help protect insiders by:

having a clear site policy on Web usage.
educating them about the site policy and risks of Web usage.
using a site-wide proxy cache to hide the identity of individual hosts from outside servers.

If your site does not want to reveal certain Web accesses from your site's domain, you may need to get Web client accounts from another Internet provider that can provide anonymous access.

Lincoln D. Stein, [email protected]
Whitehead Institute/MIT Center for Genome Research

Last modified: Fri Apr 26 10:25:00 EDT 1996