More precisely: each website has a vector of approximately 50,000 dimensions, 49,990 of which are 0, but 10 of which have a value. The "closest" URLs in this space are those which have the smallest distance from the searched URL. I've thought of doing singular value decomposition, but frankly never got around to it, because the results are really quite good. This is just the tip of the iceburg -- I'm sure delicious has well over 10,000,000 URLs.
Delicious.com has an extremely rich set of data: A very large, and arguably the most relevant, portion of the web has been described by hand#, by real people, with no incentive to cheat or skew the results. The amount of man-hours spent tagging and organizing the web is really astounding, and I'm glad I managed to make something useful out of it.
I really hope Yahoo comes to their senses and realizes how valuable this data is. Just combining URL (or domain) popularity into search results offers so much more value. One could create a killer search engine with this data.
#: A lot of the end results of tag weights has to do with how the URL was initially tagged, as a lot of people opt to use "auto-tagging" which just copies the most popular tags. Not only are the end results of tagging quite interesting, but I suspect researching how a URLs popularity and description has ebbed and flowed over time would be awesome.