Tuesday, March 14, 2006
Blog Project #2: Metadata Types
We have been working through redefining metadata, which is the first project for this blog. Though we are not nearly done, it is time to introduce a related project to catalog all the various types of metadata. I began a list tonight and posted it here.
What do you think are the key types of metadata?
Do you agree with the "traditional types" like structural, descriptive and administrative metadata?
How do these types square up against our definition?
Personally, I have a problem with structural metadata as I think it blurs the line between data and metadata which I believe is harmful.
Getting a good authoritative set of metadata types (and their definitions) will help us understand and explain the purpose of metadata. From the looks of it, I think we will have an interesting time winnowing down the list!
Another interesting side effect of collecting this list will be an interesting snapshot of our current understanding (or lack thereof) of just what constitutes metadata.
What do you think are the key types of metadata?
Do you agree with the "traditional types" like structural, descriptive and administrative metadata?
How do these types square up against our definition?
Personally, I have a problem with structural metadata as I think it blurs the line between data and metadata which I believe is harmful.
Getting a good authoritative set of metadata types (and their definitions) will help us understand and explain the purpose of metadata. From the looks of it, I think we will have an interesting time winnowing down the list!
Another interesting side effect of collecting this list will be an interesting snapshot of our current understanding (or lack thereof) of just what constitutes metadata.
Comments:
<< Home
The underlying question to consider when defining different types of metadata is the "axis of consideration", i.e. which consideration distinguishes one type of metadata from another? If you distinguish based on how it's used, you will come up with "search metadata", "audit metadata", "configuration management metadata", "data quality metadata" and others, which seem like potentially valid categories to me.
Let's pick another axis of consideration: conceptual level of representation. Here, we'd have "structural metadata" (referring to the form of data), "semantic metadata" (referring to the meaning of data), and a few others.
Creating a complete list of all types of metadata would raise three issues - first, it would be mixing apples and oranges ("semantic metadata" and "data quality metadata" aren't even at the same conceptual level of analysis) second, the idea that there could be a complete list rests on the assumption that there can be a complete list of "axes of consideration". Third, the classes overlap heavily - some "semantic metadata" could be "search metadata"...so many of the definitions would only be restatements of something mostly the same, slightly different.
I'd guess that there are roughly as many axes of consideration as there are purposes for using metadata -- which is a big number. One approach to making this tractable would be to name types based on a pairing multiple characteristics to create a compound, e.g. "semantic search metadata", or "structural quality metadata". In other words, use two axes in the definition to be very specific.
OK, out of theoretics and back down to earth. You're asking for a finite list of metadata types. I've been working with metadata people for a few years, so I'll just list the types that I most frequently come into contact with, in no particular order:
- Structural RDBMs metadata. (Descriptions of physical data models, row/column/view metadata)
- Dublin core "resource" metadata (the general stuff the DDMS captures is very popular)
- Search metadata - extra stuff beyond Dublin core that can get fairly sophisticated, such as specialized keywords, Amazon.com's concept of "Statistically Improbable Phrases (SIPs)", etc.
- Process log metadata. (e.g. X records processed by our system in Y seconds, Z customer transactions handled of business type T, A special exceptions/failures to business process P, perhaps CVS changelogs and check-in information, and so on)
- Logical data models. (Data warehouse construction, information integration tasks)
The most tricky kind, that is actually frequently created, would be compliance metadata. It's tricky because it gets into the sticky "one man's data is another man's metadata" quicksand. Namely, when one information system or organization passes its metadata to another system, which uses it as data for some compliance task. (A DoD example might be DITPR) Systems that track Sarbox compliance sometimes take somebody's metadata as input.
Let's pick another axis of consideration: conceptual level of representation. Here, we'd have "structural metadata" (referring to the form of data), "semantic metadata" (referring to the meaning of data), and a few others.
Creating a complete list of all types of metadata would raise three issues - first, it would be mixing apples and oranges ("semantic metadata" and "data quality metadata" aren't even at the same conceptual level of analysis) second, the idea that there could be a complete list rests on the assumption that there can be a complete list of "axes of consideration". Third, the classes overlap heavily - some "semantic metadata" could be "search metadata"...so many of the definitions would only be restatements of something mostly the same, slightly different.
I'd guess that there are roughly as many axes of consideration as there are purposes for using metadata -- which is a big number. One approach to making this tractable would be to name types based on a pairing multiple characteristics to create a compound, e.g. "semantic search metadata", or "structural quality metadata". In other words, use two axes in the definition to be very specific.
OK, out of theoretics and back down to earth. You're asking for a finite list of metadata types. I've been working with metadata people for a few years, so I'll just list the types that I most frequently come into contact with, in no particular order:
- Structural RDBMs metadata. (Descriptions of physical data models, row/column/view metadata)
- Dublin core "resource" metadata (the general stuff the DDMS captures is very popular)
- Search metadata - extra stuff beyond Dublin core that can get fairly sophisticated, such as specialized keywords, Amazon.com's concept of "Statistically Improbable Phrases (SIPs)", etc.
- Process log metadata. (e.g. X records processed by our system in Y seconds, Z customer transactions handled of business type T, A special exceptions/failures to business process P, perhaps CVS changelogs and check-in information, and so on)
- Logical data models. (Data warehouse construction, information integration tasks)
The most tricky kind, that is actually frequently created, would be compliance metadata. It's tricky because it gets into the sticky "one man's data is another man's metadata" quicksand. Namely, when one information system or organization passes its metadata to another system, which uses it as data for some compliance task. (A DoD example might be DITPR) Systems that track Sarbox compliance sometimes take somebody's metadata as input.
Thanks for your post. Been away so I have not been able to update the blog.
I understand your point on the difficulties in defining metadata types but I find your list rather informative. My initial purpose is both exploratory and as-is definition and then to see what insights I can extract from the result.
I updated the list this morning with some sources on wikipedia and I will also add your types in the next few days.
Regards,
- Mike
Post a Comment
I understand your point on the difficulties in defining metadata types but I find your list rather informative. My initial purpose is both exploratory and as-is definition and then to see what insights I can extract from the result.
I updated the list this morning with some sources on wikipedia and I will also add your types in the next few days.
Regards,
- Mike
<< Home