Anarchy Internet University, Fall Session '93 - Lesson 2

What you want is what you get is what you want or, I never Metadata I didn't like.

(An)Archie, Veronica, and Jughead-- You don't need a Weatherbee to know which way the wind blows

Remember at the end of the film Brazil, when Michael Palin was wearing that funny mask and torturing Sam, all part of the Information Retrieval process? You might wish that you could take to the internet with a good pair of needlenose and wrest its riches out by force, but fortunately its physical resources are so scattered that you'd probably do more damage to your PC or work site (which in some cases wouldn't be that bad a thing :>) than convince the net to fork over what you're looking for.

Net hacks have written tools that allow you to search metadata--data about data. This is a key problem in the proliferation of network access points (the internet is 'growing' at 12% per month): How do you keep track of such vast volumes of data, text, pictures, sounds, movies, numbers? The task of organizing such a heterogeneous mix of resources has never been attempted and is such a complex task that it will never fully succeed :) but is an interesting library research problem.

The tools that go out and graze the net use the various protocols. We already talked about ftp, the File Transfer Protocol. There are many other protocols on the internet that serve many different functions that the end user never need know about--this message came to you courtesy of the SMTP: Simple Mail Transfer Protocol (and the number 25). A handful of the protocols, though are of worth a cursory familiarity if you want to optimize your time in front of the screen.

PROTOCOL     ACRONYM EXPANSION         CLIENT   WHAT IT DOES
-------------------------------------------------------------------------
FTP          File Transfer                ftp   Get/Put files on remote site
             Protocol                           Remote file system manipulation.

GOPHER       Gopher Information Server gopher   Browse menu heirarchy and 
             Protocol                 xgopher   retrieve data. Character based, 
             Not an acronym :)                  graphics by separate program.

NNTP         Network News Transport     nn,rn   Read/Post news articles.
             Protocol       

WAIS         Wide Area Information waissearch   Search and retrieve documents
             Server                     xwais   Networked database access.

HTTP         HyperText Transport      xmosaic   Browse and search networked 
             Protocol                   cello   hypertext using above protocols.
                                        midas   Unix X, Mac, Windows clients.
                                        viola   incorporates support for movies,
                                         lynx   images, sound, point and click
                                        tkWWW   graphic interface. slick.

Each protocol has its own meta-data search method. Computer geeks have taken the name for the ftp meta-data search system, archie, and extended it to include gopher's search system, veronica, and just last week I heard announced a meta-data knowbot for HTTP, jughead. Remember what Anarchy said: "You don't need a Weatherbee to know which way the wind blows."

ARCHIE - FTP

The most second commonly used protocol on the internet is FTP. In the last lesson, we talked about how to log onto a ftp site and retrieve a file. Unless you have a network of friends who constantly use the most popular protocol on the internet SMTP to keep you up to date on whats new out there, you have to have ways to ftp what you want. Archie is the tool you use to do this. Its easy. Just type:

% archie country-codes

You get back a set of citations that include ftp sites, pathnames and filenames. I've edited this for space. The real query has many more hits, but I've left some to show you how the country codes distribute in an archie query., For the first cite, the host is plaza.aarnet.edu.au. The pathname is /usrnet/FAQs/alt.answers/mail and the filename is country-codes. (FAQ = Frequently Asked Questions) Read for stuff you are interested in and you won't bother people with the most common questions.

Host plaza.aarnet.edu.au               AUSTRALIA -too far away
    Location: /usenet/FAQs/alt.answers/mail
           FILE -r--r--r--      19681  Oct 13 02:10  country-codes

Host rzsun2.informatik.uni-hamburg.de  GERMANY -too far away
    Location: /pub/doc/news.answers/mail
           FILE -rw-r--r--      18947  Oct 13 10:27  country-codes

Host bloom-picayune.mit.edu            MIT - a good, fast bet
    Location: /pub/usenet-by-group/alt.answers/mail
           FILE -rw-rw-r--      19681  Oct 13 02:10  country-codes

Host charon.mit.edu                    Mega MIT server - may be busy but fast
    Location: /pub/usenet-by-group/alt.answers/mail
           FILE -rw-rw-r--      19612  Sep  1 06:30  country-codes

Host sunsite.unc.edu                   Mega Univ of NC server - fast and busy
    Location: /pub/docs/about-the-net
           FILE -rw-r--r--      20137  Jun  3 15:40  country-codes

Host grasp1.univ-lyon1.fr              FRANCE - blew '68 so why ftp from them?
    Location: /pub/faq-by-newsgroup/alt/alt.internet.services/mail
           FILE -rw-r--r--      19560  Sep  1 05:01  country-codes

Host han.hana.nm.kr                    KR? check out the country-codes for info
    Location: /netinfo/sh.cs.net
           FILE -rw-r--r--      16442  Jan 14 1993  country-codes

Host nnsc.nsf.net                      .net = network support site
    Location: /info
           FILE -rw-rw-r--      17455  Oct 20 1992  country-codes

Host svin02.info.win.tue.nl            NETHERLANDS - slow link
    Location: /pub/usenet/news.answers/mail
           FILE -rw-r--r--      19652  Sep  1 02:00  country-codes

Host ugle.unit.no                      NORWAY - exotic, but slow link
    Location: /faq/comp.answers/mail
           FILE -rw-rw-r--      19633  Sep  1 05:04  country-codes

Usually the information in the citation helps you decide if its worth your time to go check it out you get the file size in bytes (characters) the last modification data and its location. You need to check out the location of the site too. Although the network is fast, you are limited in the speed of your search by the slowest link in the virtual circuit between you and the ftp site you are on. Usually, the closer the ftp site to you physically the faster the transfer and response will be faster, all things being equal. If I can get something from Berkeley or MIT instead of from Finland, New Zealand or Taiwan, I'll get it from the backbone instead of from the spines. Anyway, if you want to find out the country codes so you don't ftp the otherside of the world, this archie search will do it for you:

All you have to do is

1) ftp to the ftp site,
2) cd to the pathname,
3) get file.

VERONICA - GOPHER

Gopher is a menu-driven networked information retrieval system developed at the University of Minnesota. I never read the manual on gopher and it was just totally intuitive to use if you have ever used a computer for anything. All you have to do is hit return and use the up/down arrow keys, pgup and pgdn and the 'u' key to go up levels. Otherwise follow instructions and you can't go wrong. Have fun in 'gopherspace.'

The Veronica server is located off of the root gopher at gopher.umich.edu. All you have to do is:

1) run gopher. Typing gopher automatically connects to the root gopher server at the Univ of Minn: gopher.tc.umn.edu. You can also connect to any other gopher server by typing '% gopher host.id.domain' where host.id.domain is the real id for the system you are interested in.

% gopher

2) select menu item 8. Other Gopher and Information Servers/ slink down the menu with the down arrow key till you get to number 8 and then hit return. You can also type 8 and then return. (names ending with a / mean that there is another gopher level available when you hit return for that item. names without a / are terminal, in that they point to a resource.)

3) select item 2. Search titles in Gopherspace using veronica/.

4) You will get a list of available search methods. You can experiment to find which works better for you. There are some search methods that return results that are 'protected' and you can't read. But you won't know until you try :>. All of these menu items (ending with a <?>) are searchable indexes. If you hit return on them, you will be prompted for a search term. When you enter it, blammo, a new gopher level is created for you of all gopher items that matched your query.

WAIS

WAIS is a searching protocol based on the NISO Z39.50 Information Retrieval standard. It currently exists as a networked set of database servers that can register with central sites. You search by querying the central site (a centralization that cannot continue for long if the WAIS server community continues to grow as it has, but is now quake.think.com) and it returns a set of servers that you would use in a second query, that would return documents or referrals to other sources. It, like all of the protocols discussed, is a client-server system, which means that there are clients that operate on many different platforms, PC's, Windows, Mac's, probably VMS as well as several UNIX X front-ends.

Most all of this software is freely distributed in source form (this means that you have the human-readable program) which means that anyone who knows the programming language can alter it to suit their needs. This public domain software is cool because its free, its a standard that no corporation has control over and the most popular serious operating system, UNIX, is practically free in source form. The only catch is that you cannot use the code for profit, unless you pay a hefty fee, but no one ever said that undermining the state was a profit-motivated endeavor, at least as far as the law is concerned :).

You can ftp to think.com and wais.com for information and sources for WAIS servers and clients. If you are going to use WAIS to put up content that is non-profit and contributes to making the state an endangered species, drop me an e-mail and I can provide some pro-bono consulting. You basically take your text (and associated images or sounds and stuff) index it through your own database system or one that comes with WAIS. You then set up a WAIS server.

When someone asks a question of a WAIS client it connects to a server via a special network place (a port) that the WAIS server has been patiently listening to. The server spins off a copy of itself to handle the search and returns to its regimen of listening. The handler process then searches the database, and returns a set of matches, or hits that include a direct key that contains an unique identifier so you have all the data you need to snarf the data file in the next transaction.

This way the server can be STATELESS in that it needn't remember any previous transaction because each transaction contains enough data to completely describe the query. The client puts the hit list on the screen, the user then selects some of the hits to retrieve. The client whisks the direct-access key off to the server, which returns the requested data. This whole transaction session is described in formal internet protocols so that any host that has a Z39.50 server can be queried by a user on any computer that has a Z39.50 client.

JUGHEAD - HTTP

This package has just been announced and it probably won't be functional for several months yet. Today someone mentioned a jughead for gopher, so its up in the air.

The implementation of HTTP, with clients and servers and a hypertext scripting markup language makes up the World Wide Web, or WWW, www, W3 or just the web. The cool part is that you can edit a page in vi, entering HTML, the hypertext markup language, descriptions of a page, that contains 'hot spot' links that can point to any resource available on the network. One document on anarchy, for example, can point to resources--sound, video, images, text, data retrieval anything--around the world that make a kind of exhibit that can be accessed by anyone else on the network so equipped. The browsers all incorporate support for ftp, wais and gopher into their interface, so users need only learn the HTTP client.

The so equipped part is the problem right now. Most of these clients are networked applications, in that they do more than put characters up on a terminal screen like a kermit or procomm type connection does. They interoperate with special servers that run graphics screens, so you have to have a network connection. This kind of network connections transmit so much data (high bandwidth) that most people can't afford it. There are text-based browsers that peruse this hypertext jungle, but they don't have the glitz of the X (unix window system) implementation. Soon enough we will most all have access to high-speed networking, either by the local library or at home, so if you're into this, it shouldn't be a problem.

No test afterward. Go out, drink a beer and netnavigate the state away.

Coming soon--encryption.