Tuesday, May 6, 2008

Databases as Representations (Searching, Hits, Encounters)

(http://archive.salon.com/tech/feature/2001/06/21/google_henziger/print.html)

In order to build a database for a search engine, the search engine company must build an “internal representation” of the web, the web’s link structure, and build into that representation the potential for rank, hierarchy vis. individual searches.

This “representation” is based on the company’s “understanding” of what documents are about, i.e. their algorithms are that understanding and a representation of that understanding. They are the part of the search engine that encodes an understanding of how to understand web pages in terms of their potential relevance to an potentially infinite set of potential searches. Talk about potentiality and immanence!

Where potentiality meets hard materiality is when a person conducts a search. Then, the results of that search can only bring back what is immanent in the representation. It cannot bring back what the database does not contain, a document (webpage) that has not been indexed, a word that does not appear in the database, a combination of words that has not been cross-referenced or is not cross-referenceable in the database. The database is potentiality ONLY when the database is being built, when the web crawlers are doing their work. It ceases to be potentiality when queries are conducted: at that exact point in time, the possibilities are not endless. They are vast, but not endless. Not that anyone would notice, but it’s an important distinction (cf. the fact from a post ago that when one searches Google, one only searches about half the searchable content of the web).

A database is a representation.

The other half of the search that a search engine must “understand” is the search query itself. Here potentiality comes back in in the form of unreadability, unpredictability, illegibility of the search. If the search engine cannot figure out what the search is about, then it returns more random hits. So the query is itself a representation, in the eyes of the search engine, and some are more legible than others. Legible or their real wish or desire. “Car” is an ambiguous wish. “Car sex” is a less ambiguous wish. This is true both for human readers and search engines.

So an interesting feature of Beacon is that it puts viewers in the same position that the search engines themselves are in: both are readers of the queries, both are in a position of having to interpret what the query means, what it points to, what it indexes, what it wants. Perhaps the search engine’s reading is a more constrained, instrumental one; perhaps it wants only to know more precisely what the searcher wants to know, get, learn. Whereas Beacon’s viewers, forced into the position of search engine-as-reader, get to have a more wide-reaching curiosity, get to have pathologizing thoughts, more sociological thoughts, make the searcher into a general person rather than a singular person. But the search engine uses rules to interpret the query, which is its own form of generalization: making an individual into a general person on the basis of their wants and wishes, insofar as those wants and wishes are able to be articulated with the wants and wishes of other searchers, insofar as the wants and wishes are generalizable. Where the general probably means repeated; where something like that wish has been seen before, so that it can 1. Be predicted by the database and the crawlers and 2. Be predicted by the program that matches queries to the database, one representation to another.

When a person’s query encounters the search engine database, one representation (of the web as indexed by, say, google) encounters another (a person’s capacity to render their wish in words). A representation of the web meets a representation of a single person’s wish. The results of the encounter are called “hits.” Why hits? Literally, a hit is where a person’s wish hits, impacts, lands in the database. The search engine then returns these hits as results, but they are literally the trace of an impact.

No comments: