5263 - Vector Model Information Retrieval

Theory of Information Retrieval, Florida State University LIS-5263 (Fall, 2003)
Written by Rich Ackerman, September 25. 2003
http://www.hray.com/


This program shows the use of the vector model of information retrieval to search a collection of documents for keywords you supply.

The document set consists of 10 files:

 

Search terms:

The set of normalized term frequencies [f(i,j)] are here. These show the triples of {term frequency, word, document}.

The set of inverse document frequencies (idf) are here. These show {idf, word}.

The weights of document word lists are shown here. This is, in effect, a concatenation of all the document vectors, as it was more convenient to deal with them in a single table. These show { w(i,j), word, document }.

The inverse document frequencies are also used a weights for the query vector, although more sophisticated query weights are suggested in the literature.

I wrote an explanation of the math for non-math majors here.

If you like perl here's the code.