Friday, March 24, 2006

 

Biometrics Metadata

I have been swamped the last few days assisting our developers on the Biometrics Automated Toolset (BAT) debugging and redesigning a distributed matching engine. Besides the critical nature of a rapid solution to fix bugs in a fielded system, it was great to be back coding. Those who know me, know that I never pass up an opportunity to code.
That brings up the issue of Biometrics metadata and person-centric metadata. On the subject of Biometrics data, there are multiple standards defined by NIST on biometric data. As for biometrics metadata it typically falls into a few types:

An important issue relating to biometrics is distinguishing between person and identity. I recently wrote an internal Oberon newsletter article that defined the two as follows:

How you model these is critical and which one goes on a watchlist also has important ramifications to the actions that should be taken.

Biometrics metadata is one of my top candidates for important areas to focus on. Most people intuitively get the importance of data and metadata about people; however, as the person/identity issue points out ... we are still not modeling this category properly.

Monday, March 20, 2006

 

The Government Metadata Controversy

At the end of last year, Government Computer News published the article, "Metadata Not Essential For Search", based on the results of a Request for Information asking about "Efficient and Effective Information Sharing".

The report said that the results "overwhelmingly" support the hypothesis that “..for the majority of government information, exposing it to indexing with commercial search technology is sufficient to meet the information categorization, dissemination, and sharing needs of the public and as required by law and policy.”

Of course, the word "overwhelmingly" was used because 56% of the respondents supported the hypothesis.. One respondent stated that "Search technology has progressed far enough so that manual categorization and metadata tagging of textural documents is no longer necessary and any perceived gain in accessibility does not justify the cost of categorization."



This is the "Google as a Silver Bullet" argument, but brings about some interesting thoughts.

  • Precise vs. "Good Enough" Results - Can or should the government settle for "good enough" results when indexing their non-marked-up data with COTS tools out of the box?

  • Availability of Data - The metadata markup process (when it is not automated) may limit how soon that data is available. This is an issue that I think the government is painfully aware of.

  • Hybrid Solutions? A Focus on Rules for Pattern Recognition, Auto-Tagging? - This is what I call "guess metadata" (metadata that is determined by a computer process, and not a man-in-the-loop). One thing that the report didn't really focus on was that much time and effort is spent defining pattern recognition rules for concepts/keywords for automated markup of metadata in the search engine indexing process. These rules are utilized at indexing time, and increase the likelihood of good results vs. a "search engine out of the box" solution.

    From a business perspective, looking at the results of the respondents in this paper, we do need to recognize the need to get the data out faster with minimal impact to organizations making data available. In doing so, we can focus on rules and a pattern-recognition process at pre-indexing time. At indexing time, this process can tag the data with agreed-upon metadata standards, even tying elements of the data to classification taxonomies and ontologies. Of course, IMHO, any automated process is still "guess metadata". Is "guess metadata" good enough?

  • ?


    Sunday, March 19, 2006

     

    Away for a few days

    Been away down to South Carolina... Updated the Metadata Types page with a few more entries. Also working on the next version of the definition which I will post soon.

    Tuesday, March 14, 2006

     

    Blog Project #2: Metadata Types

    We have been working through redefining metadata, which is the first project for this blog. Though we are not nearly done, it is time to introduce a related project to catalog all the various types of metadata. I began a list tonight and posted it here.
    What do you think are the key types of metadata?
    Do you agree with the "traditional types" like structural, descriptive and administrative metadata?
    How do these types square up against our definition?
    Personally, I have a problem with structural metadata as I think it blurs the line between data and metadata which I believe is harmful.
    Getting a good authoritative set of metadata types (and their definitions) will help us understand and explain the purpose of metadata. From the looks of it, I think we will have an interesting time winnowing down the list!
    Another interesting side effect of collecting this list will be an interesting snapshot of our current understanding (or lack thereof) of just what constitutes metadata.

    Monday, March 13, 2006

     

    Strawman definition and Open XMP

    Here is my first cut at a strawman definition:
    Metadata – an external description of a distinct data resource to provide context, metrics or amplification.

    Let me know what you think...
    On another note, Adobe announced Open XMP. I am looking forward to checking this out. It is good to see Adobe take a leadership role in this space. Note: XMP uses the W3C RDF specification.

    Wednesday, March 08, 2006

     

    Speaking on April 13, 2006 on Information Sharing

    Visit dmforum for information on my upcoming presentation on information sharing. Bill Inmon and Dick Burk are also speaking. My presentation will focus on Information Sharing and specifically the metadata of the Federal Enterprise Architecture Data Reference Model. Over the next few months I want to refine that metadata and even work on the next version of the XML Schema. If you would like to work with me on this, feel free to email me at my oberon associates address.

    Tuesday, March 07, 2006

     

    Hysteresis, Cost and "About-ness"

    An article from Sean McGrath on Metadata asserts that there is a natural delay between creating information resources and understanding them. This natural delay is called hysteresis. His solution to this problem is to let creators create content and categorizers (or metadata experts) add the metadata at a later time. I disagree with this as a general principle because it increases the cost side of the metadata equation without being able to concretely define the commensurate value received. In an ideal world, the majority of metadata can be added at the point of creation with the flexibility to add additional metadata afterwards. I previously worked on a project called the Virtual Knowledge Base where we devised a scheme to enable multiple layers of metadata where we differentiatied between machine-generated versus human-generated metadata (and allowed multiple layers of both).
    Our key takeaway principle is this: Don't put the metadata cart before the business value horse.
    Since metadata is extrinsic information, it is either automated (ideal solution and zero cost), semi-automated and done at the point of creation (small cost) or worth the cost of adding it after the fact. The last option is the last one for good reason.
    The article uses the term "about-ness" to describe metadata and I think this is a superb, though made-up, word. Let's add that one to key terms in our new definition of metadata. Our current list of terms for our new definition of metadata are:
    1. description
    2. context
    3. "about-ness"


    Monday, March 06, 2006

     

    The Good, the Bad and the Ugly of Administrative Metadata

    Administrative metadata is descriptive information about the creating and managing of information assets like author, creation date, etc. The most well known standard for tracking this type of information is the dublin core metadata initiative.
    The good news is that this type of metadata is well-understood, fairly easy to implement and even partially supported in most editing applications like MS Word, Photoshop, etc.
    The bad news is that this type of metadata is so ubiquitous that it is intimidating to manage and therefore seen by many managers as overkill. Think of the number of records required in a metadata repository to catalog all the administrative metadata on every document, diagram, photo and report created in your organization. It is a scary proposition for large organizations and if you cannot cleary define the benefit gained from such management, such an undertaking is dead-on-arrival. So, unless you control the digital production process (and some organization's do) where this step can be automated or semi-automated, I do not recommend you try to capture this metadata. That brings us to the ugly...
    Several recent stories (here and here), have highlighted the danger of unmanaged and sometimes hidden metadata in photos and documents that embarrass the organizations that created them. Does this mean we are back to recommending storage of all administrative metadata to eliminate such gaffes? No. It means that organization's do need to be aware of the metadata that editing programs store and then determine which digital data are important enough to require control of the production process.
    A good resource would be a listing of all the metadata stored by current popular editing tools... if you know of such a resource (or have time to develop one), please let me know.

    Sunday, March 05, 2006

     

    "data about data" considered harmful!

    The classical definition of metadata is what I refer to as the "geek definition" because it includes a nice double entendre with its recursiveness. Unfortunately, this definition is like nails on a chalk board to business managers. Business managers don't want the clever definition... they don't want to know how it is cool to be both metadata about some other data and data at the same time which can have metadata pointing to it which can then be data for more metadata pointing to it, on and on ... until their eyes glaze over. In essence, that is a technically accurate but practically foolish definition. I am working on a better definition and your feedback is welcome.

    Here is my initial requirement for a new definition:

    Let's begin the work of fixing this mess... and please, hold the cleverness.


    Saturday, March 04, 2006

     

    Story clarification

    Had dinner on Thursday with two good friends, Kevin Smith and Danny Proko. At dinner, Kevin said that he had read an article on me and that I had "slammed my CIO" (CIO is Chief Information Officer). Being that Kevin is a really smart guy, if he misread the article entitled "Metadata Dreams Adrift", than I am sure many others have also.

    Of course, the misunderstanding comes from the pullout quote where I say, "The Homeland Security Department’s cross-cutting mission will not be successful without cross-cutting IT. ... the way you get that is with a strong CIO". Of course, that is only part of the story. So, let me add the 'rest of the story' here to clear this up. First, I was absolutely NOT slamming my CIO. The comment was made about the position of CIO and not the person who is the CIO. I was asserting, as many others have asserted, that in order for DHS to be successfully integrated, in this case specifically Information Technology integration, the CIO should have the component CIOs directly reporting to him in order to successfully carry-out the numerous cross-cutting missions of the department. While that decision is way above my former paygrade, I said it because I believe that is what it will take to be effective. But don't get me wrong... I am proud of the work DHS is currently doing and the fine group of people working hard under extreme pressure and public scrutiny.

    At a later time, I will address the reasons why I left DHS in more detail ...

    Hope that clears things up... for those interested... In this blog, I will be dissecting the role of metadata within the federal goverment and what changes need to be made, both in industry and goverment, to have it fulfill its promise. Believe me, there are many, many changes that need to be made and us techies are a big part of the problem. The way we explain metadata and the current raft of products that support it lack the necessary focus and value proposition to go mainstream. I hope to change that.

    - Mike

    Friday, March 03, 2006

     

    What is this?

    Hi Everyone,

    This will be a blog on my personal experiences and strategic analysis of the art and science of designing metadata to productize and manage are vital data assets.

    - Mike

    This page is powered by Blogger. Isn't yours?