Especially with the advent of E-commerce, privacy has become an important issue for everyone using the Internet -- whether or not they know it, which is one of the biggest issues in Internet privacy.
This page is not meant to be a detailed explanation of Internet security and privacy issues. However, I will talk about some of the technical side of the Internet -- I have to, in order to explain how you can help protect yourself from invasions of privacy. I can't describe all of the possible threats, but I will talk about a couple of the common ones. I have also described what my Web server keeps track off -- which isn't much at all, and certainly not enough to identify you as a visitor.
To borrow a term from its designers, the World-Wide Web is stateless.
This has nothing to do with the international reach of the Internet -- it just means that Web servers do not normally remember browsers from one request to the next. As a matter of fact, a server will not necessarily remmeber your browser even during a single page request: even a page as simple as this one requires several requests to the server, one for the HTML itself and one for each image on the page. Web servers simply don't have enough storage space or computing capacity to remember every request in a useful manner.
There is a record kept of every access to a Web site. Every Web
server keeps an access log and an error log for maintenance and debugging.
However, the server log contains very little that could be used to
identify the user who accessed the site. A typical access log entry looks
like this:
123.456.789.012 - - [00/Jun/2000:24:00:00 -0400] www.drreddy.com "GET /shots/fifth.html HTTP/1.0" 200 0000 "http://www.google.com/search?q=fifth+disease&meta=lr%3D%26hl%3Den&btnG=Google+Search" "Mozilla/4.06 [en] (Win98; I)"
(This is a real entry from my server log -- except that I have changed
enough information that the person whose access is recorded cannot be
identified.)
The most important pieces of information here are:
The easiest way for a Webmaster to gather information on users is to limit access to the site to "authorized" users. These users are given user IDs and passwords, which they must then use before the server will send them restricted pages.
When you enter an ID and password on your browser, your browser saves them for as long as you have the browser running. (You can clear IDs and passwords by closing the browser program, then restarting it. Internet Explorer offers users the option of storing the ID and password on your computer for use in future sessions.) Being stateless, the server does not remember your ID and password for each access (the ID is recorded in the server log, but not the password) -- but the browser sends your ID and password as part of every request to that server.
The privacy risk of ID/password identification depends on what information you have to give the server -- and the Webmaster -- to obtain an ID. Often all you need to give is your E-mail address (but then you need to think about what the Webmaster may do with that information... when I get E-mail addresses from visitors I use them only to reply to their questions, but we've all heard of Webmasters collecting E-mail addresses for spam lists). Some sites "require" a lot more information, and you need to think carefully about what information you're giving out -- especially since the passwords are not encoded, and so can be captured by anyone smart enough to intercept Internet transmissions.
A fairly popular way to track users is to generate a new home page for everyone who visits a site. The page looks the same to every visitor, but each page's links to other pages on the site contain a string of letters and numbers unique to that page. If you bookmark one of those coded-URL pages, the server will know you're back every time you use the bookmark, and can use the coded-URL information to track your travels through the site.
This kind of tracking is not easy to block completely. One way to block it is to go to the home page every time you use the site -- and make sure you reload the home page for every set of accesses to get different coded URLs. Of course, this makes using that site much less convenient: you need to balance your desire for privacy against the effort needed to insure it (but then, that could be said of Internet use these days, too...).
The cookie (or magic cookie in Netscape parlance) is a great way to make Internet use -- and E-commerce -- convenient. It is also one of the most significant threats to the privacy of unwary Web surfers.
A cookie is a (usually coded) string of letters and numbers that a Web server sends to your browser when the browser requests a page (or image, or sound file, or any other file). Your browser stores this string in a "cookie jar" -- a file on your computer (Netscape browsers use the file COOKIE.TXT). After the cookie is set, every time your browser gets a page, image, or anything else from that server the cookie is sent with the request.
There are some restrictions on who can get what out of the cookie jar. Browsers will send cookies only to the server that set them, or to other servers in the same domain. However, that still leaves a lot of latitude as far as cookie-linked identifying information is concerned.
Cookies can be very useful. For example, some sites use cookies to store encoded passwords for use in future sessions (one example is the New York Times Web site, which also uses cookies to identify "premium" users with access to paid features). However, they can easily be used to track your Internet use. One now famous example is the Web advertising service doubleclick.net, which sets or reads a cookie every time it displays an ad on a Web page. Since the request to the ad server includes the URL of the page the ad appeared on, the ad server can compile a list of every site you visit that carries their ads. This allows them to tailor their ads to your tastes. It also gives them a lot of information about you, just by looking at the sites you visit.
You can read the COOKIE.TXT file (or its equivalent) with any text editor. Modifying the cookie file directly is risky, since some of the characters in the file are non-printing "control" characters and disturbing the contents may make the entire file unreadable. There are also commercial and shareware programs available that will allow you to read the cookie file and even modify the cookies.
You can also set most advanced browsers (Netscape versions 3.0 and higher, and Internet Explorer from at least version 4.0 on) to warn you when a cookie is being set, and ask you if you want the cookie saved or not. (In Netscape version 3.x, select "Network Preferences" from the Options menu, then select the "Protocols" tab and check the box marked "Show an Alert Before Accepting a Cookie". In Netscape version 4.x, select "Preferences" from the Edit menu, then select Advanced and check the "Warn Me Before Accepting a Cookie" box.) Once you do this, you will get an alert box whenever a cookie is set: the box will show you what the cookie contains, the name of the server that wants to set the cookie, and the servers that can read the cookie if you allow it to be set. It will then ask you if you want the cookie. If you do, click "OK"; if you don't, click "Cancel".
If you are really concerned about privacy, you can also block all cookies from your computer.
Browsers store cookies for a particular session in working memory; typically (although this may change) they do not write the new or changed cookies to a file until you close the browser at the end of a session. If you set the cookie file to be "read-only" (with the ATTRIB command in DOS, or using the File Manager or Explorer in Windows -- I don't know Macs well enough to tell you how to do it there), you will prevent your browser from saving new or changed cookies.
You may want to allow a few cookies, and block the rest. To do this, you can
This will let the browser save all the cookies you want to keep, but prevent it from saving others at all. If you later find a site whose cookies you want to keep, just repeat the process for that site.
Not much...
I do keep a condensed version of the server log, which has only one entry per user no matter how many pages and images you access in a given session. I keep track of what page each user visits first in a session (that is the best way for me to see what pages are popular); I do not regularly check to see which sites refer users to my site (although I may start to do that just to see where people find out about my site). I do get E-mail addresses from people who write to me with comments and questions -- I need the addresses to reply to mail, but I do not keep the addresses except in my mailbox. I'm sufficiently concerned about privacy that I wouldn't want to collect and store any other information -- and if I were visiting another site, I wouldn't want other information about me captured, either. There really isn't any good reason that I can see to collect identifying information about the visitors to a medical Web site, especially without their permission. (You might want to think about that when you're surfing medical sites -- or any other Web sites...)