Amazon Similarity Explorer

Web of Knowledge



Amazon Similarity Explorer
new search | design | data | results

I wrote most of the Amazon Similarity Explorer (ASE) long before the clarifying comments were added to the assignment; the operative phrase for me was "Using one of several recommendation functions..." I chose to explore the similarity function, simply because it was the only recommendation function available through the Amazon Web Services, a application programming interface (API) to Amazon.com that I was interested in exploring.

Given a tool like ASE, it is possible to create and examine a large number of personal webs, since the time consuming steps of construction and link checking are mostly automated. The downside, at least in this implementation, is that you have less freedom in the selection process; a better designed front end would allow more navigational freedom through the similarity lists. While ASE does allow you to go from one list to another by clicking on the linked ISBN, it limits you to a selecting (for your personal web) from a single set of similarities rather than browsing, adding to your set, browsing some more, etc. Creating a similarity search with a large Search Depth value (say, 50 or 100) gives you a broad selection to choose from, but you are still limited to those entities that Amazon sees fit to include in the "similar to" lists.

An occasional oddity pops up, revealing design decisions made by Amazon database and web service designers. My personal favorite was the search for help on Microsoft HTML Help Authoring (1572316039) that led quickly to Michael Moore's "Stupid White Men." It is a great inside joke: Microsoft HTML Help is a terrible authoring system. Investigation of the Amazon page reveals that the "similar to" list for this book is identical to the "Customers who bought this book also bought..." This is certainly not always the case, but it seems as though Amazon.com uses purchasing patterns as a substitute for editorial opinion on low volume items.

Other design decisions add a lot of value: Amazon has built a very sophisticated information retrieval system behind their simple query box. Amazon's search philosophy is to always return something. Even the most total gibberish generates a set of products to buy, prefaced by a brieft apology: "We found no matches for j asdjl;fal s. Below are results for fal." Designers of information organization systems can learn from these types of searches.

Other strange results in ASE happen as a function of Amazon quirks. A search for "art glass" returns a web of medical dictionaries and handbooks; checking the Amazon.com result for the same search reveals a medical handbook inexplicably at the top of the return set. Occasionally, spurious connections appear and then disappear; during one search of cookbooks, a SQL Server 2000 manual mysteriously showed up in the middle of my personal web. Had someone just purchased a cookbook and a database manual, causing some temporary linkage in the Amazon database? I reran the identical query and it disappeared. The Amazon database is a dynamic entity, so an application like ASE that interrogates returns different results for the same query over relatively short periods of time.

Interesting patterns in information organization emerge through experimentation with ASE. Some I've seen include:

Seed Depth Type Term Result
Information architecture 100 wide keyword Going along, hit O'Reilly books, get 41 O'Reilly titles in a row, then you break out!
" " deep " Get stuck in a list of "Schaum's Outline of ______" titles, all of which refer to one another.
Defoe 30 wide author 30 books with no duplications: density of zero. I made a personal web of the 18 I had read. It had a density of 35%. With 18 picks, the highest density would be 50% because the similarity list for each book only contains nine entries. Thus "Gulliver's Travels", which links to nine other books, actually was 100% linked: every one of its similarities was in the Defoe personal web.
China 24 wide keyword This personal web travels from from China through library science and into telecommunications. The high relevance given to authors in similarity lists is reflected here, where a wide ranging classics scholar, Lionel Casson, happens to move our similarity search from the ancient Far East into library science through his work "Libraries in the Ancient World." That links to "Double Fold: Libraries and the Assault on Paper" by Nicholson Baker which leads to an entirely different web of LIS books. The China web and LIS web both have typical degrees of connectivity, but only this single link (Casson-Baker) joins the two worlds.


Similarity lists in Amazon clearly have some organizing principles:

  • Works by the same author often are closely linked. If your ASE search finds a book by a prolific best-selling author, you may never escape from the associated string of similar items..
  • Works in a series are closely linked. Cliff's Notes, multi-volume sets, and technical guides like the "Dummy" series all have a high degree of internal linking.
  • Obscure works have connections that, at times, make little or no sense.

Finally, in considering ASE and the Amazon Web Services interface to Amazon.com, we should consider the question of its validity as a bibliographic system. Svenonius provides an updated version of the IFLA bibliographic objectives containing five requirements: locating, identifying, selecting, obtaining, and navigating. (2002) This system is not designed to be a complete bibliographic system, and it fails miserably on most counts:

  1. Locating While offering item access according to author, ISBN, and keyword, the Amazon Web Services (and thus ASE) does not specifically offer a "title" search type. However, experimentation reveals that putting the title of a work in a keyword search returns the intended book. However, ASE is designed to explore similarities, so the real book is not actually returned; the similarities are.
  2. Identifying Few of the identifying characteristics of a given work are returned through this exercise.
  3. Selecting A selection mechanism is included for users to add books to their personal web.
  4. Obtain This does not help obtain the book
  5. Navigate Navigation needs lots of improvement. One can go from similarity list to new similarity list by clicking on an ISBN number, thereby starting a new search. However, many other improvements in browsability and navigability can readily be imagined.

While an "OK" exploration tool, ASE fails miserably as a bibliographic system because it was not designed to be one. What is interesting, though, is how an intellectual framework like this could be used to improve an application like ASE. If one was actually making a real service for end users, Svenonius's objectives would provide useful ideas for design and implementation.

For instance, an author search using AWS returns an array of books related to that author. If we were building a bibliographic system, we would want to display that result list to the user for consideration. However, ASE ignores all but the first book in the result list, using that as a key for a similarity search. It is correct to do so: this is the Similarity Explorer, not the "Colocation Explorer", but it does hinder ASE's bibliographic utility. It might be more useful to display the entire array of books resulting from the first search, and then let the user pick the one to use as the beginning of the "similar to" chain.

Amazon Web Services offers programmers powerful access to their database. However, the richness and variety that makes Amazon.com a fun shopping experience comes not from the database, but from the layers of information organization built on top of it. The diversity found through the variety of lists in the user interface are lacking in ASE due to its exclusive utilization of the similarity function.