Theory of Information Retrieval, Florida State University LIS-5263 (Fall,
2003)
Written by Rich Ackerman, September 25. 2003
http://www.hray.com/
This program shows the use of the vector model of information retrieval to search a collection of documents for keywords you supply.
The document set consists of 10 files:
The set of normalized term frequencies [f(i,j)] are here. These show the triples of {term frequency, word, document}.
The set of inverse document frequencies (idf) are here. These show {idf, word}.
The weights of document word lists are shown here. This is, in effect, a concatenation of all the document vectors, as it was more convenient to deal with them in a single table. These show { w(i,j), word, document }.
The inverse document frequencies are also used a weights for the query vector, although more sophisticated query weights are suggested in the literature.
I wrote an explanation of the math for non-math majors here.
If you like perl here's the code.