Saturday, November 25, 2006
Awesome Semantic Web Application!!
Don't walk, run over to pandora.com for a glimpse at a robust music ontology in action! This website is a recommender system that uses a music ontology (smartly called the Music Genome Project) to recommend music similar to your known favorites by decomposing your preference into its component parts. I have been using this for a few days and my initial impression is that it works great! Kudos!
How many more "Genome" (i.e. Semantic Web) projects will we see in the coming years? If this is any indication, my guess is many, many ...
How many more "Genome" (i.e. Semantic Web) projects will we see in the coming years? If this is any indication, my guess is many, many ...
Torching Doctorow: Part 1
Cory Doctorow wrote an article in 2001 called "Metacrap" where he purports to expose seven fatal flaws to reaching a "metatopia" where there is a "world of exhaustive, reliable metadata."
Before we dig into his 7 "straw-men", let's examine a little metadata about Cory Doctorow to attempt to determine his qualifications for making such assertions. First, it is interesting that he does not post any qualifications or any references to back up any of his assertions in the article. We do not know how much, if any, metadata he has actually created. Secondly, wikipedia claims that he only has a high school degree. Third, he is certainly not a practicing IT professional, per his own website. Given the above three things, we must take his assertions with a big, grain of salt. However, given that there are a number of blog entries and links to his article it is worthwhile to at least examine his arguments. Of course, publicity is not any indicator of truth and I surmise that most links to his article are merely people commiserating the fact that metadata is hard to do right. Fortunately, there are those of us that still believe that "hard" does not equate to "wrong" and that this is a temporary state due to lack of expertise. More on that later ... let's get back to Doctorow.
Strawman #1. People Lie. Doctorow uses this to attack the reliability of metadata. His argument is that because metadata "lives in a competitive world", people will lie to gain advantage. Frankly, this is a ridiculous statement because all metadata does NOT live in a competitive world. In fact, the most important metadata, or enterprise metadata, will not live on the "wild, uncontrolled internet" but in the controlled, corporate intranet.
So, let's debunk this in a number of ways:
a. People without access to my metadata can lie all they want and it won't affect me.
b. People lie more when they can lie without attribution. That is why Wikipedia has so many problems (lack of attribution).
c. People lie about both data and metadata but Doctorow is not saying we should distrust the entire internet.
So, the real point here is that Non-attributed data and metadata, of any type, is untrustworthy. Fortunately, every good metadata development process includes attribution via governance so this is truly a tangential argument (at best).
d. Metadata is not the victim of people lying but a cure to that problem. For example, a metadata attribute of "reliability" which is used in a number of very credible organizations is quite effective in measuring the trustworthiness of the source of information. Of course, in some cases, capturing lineage can replace additional metadata attributes for judging reliability.
Daconta's Counterpoint #1. Reliable data and metadata is properly attributed.
Next time we will examine Straw-man #2...
Before we dig into his 7 "straw-men", let's examine a little metadata about Cory Doctorow to attempt to determine his qualifications for making such assertions. First, it is interesting that he does not post any qualifications or any references to back up any of his assertions in the article. We do not know how much, if any, metadata he has actually created. Secondly, wikipedia claims that he only has a high school degree. Third, he is certainly not a practicing IT professional, per his own website. Given the above three things, we must take his assertions with a big, grain of salt. However, given that there are a number of blog entries and links to his article it is worthwhile to at least examine his arguments. Of course, publicity is not any indicator of truth and I surmise that most links to his article are merely people commiserating the fact that metadata is hard to do right. Fortunately, there are those of us that still believe that "hard" does not equate to "wrong" and that this is a temporary state due to lack of expertise. More on that later ... let's get back to Doctorow.
Strawman #1. People Lie. Doctorow uses this to attack the reliability of metadata. His argument is that because metadata "lives in a competitive world", people will lie to gain advantage. Frankly, this is a ridiculous statement because all metadata does NOT live in a competitive world. In fact, the most important metadata, or enterprise metadata, will not live on the "wild, uncontrolled internet" but in the controlled, corporate intranet.
So, let's debunk this in a number of ways:
a. People without access to my metadata can lie all they want and it won't affect me.
b. People lie more when they can lie without attribution. That is why Wikipedia has so many problems (lack of attribution).
c. People lie about both data and metadata but Doctorow is not saying we should distrust the entire internet.
So, the real point here is that Non-attributed data and metadata, of any type, is untrustworthy. Fortunately, every good metadata development process includes attribution via governance so this is truly a tangential argument (at best).
d. Metadata is not the victim of people lying but a cure to that problem. For example, a metadata attribute of "reliability" which is used in a number of very credible organizations is quite effective in measuring the trustworthiness of the source of information. Of course, in some cases, capturing lineage can replace additional metadata attributes for judging reliability.
Daconta's Counterpoint #1. Reliable data and metadata is properly attributed.
Next time we will examine Straw-man #2...
Sunday, November 05, 2006
Geospatial Metadata
Ok, moving on from the application metadata example (by the way, that was very helpful in ferreting out the principles of metadata design) - I am now researching interrogative metadata (who, what, when, where, why). The basic principle here is that the interrogatives are powerful vertices or axes to describing our data assets. In that, I am starting with "where" and geospatial metadata. Of course, there is a wealth of information on creating and using geospatial metadata. The Federal Geospatial Data Committee has a good site here. Also, geospatial one stop has some good pages on metadata creation.
More on this soon...