Practical Metadata

Friday, February 12, 2010

Data.gov

Most of my time has been consumed analyzing data.gov data.

Please see the data.gov collaboration site and share you ideas on improving data.gov.

I also have an idea there on metadata and asking for help in identifying the right metadata elements for the data.gov template.

Hope to see you there!

# posted by Michael C. Daconta @ 2/12/2010 10:42:00 AM

Thursday, November 27, 2008

Digging out from the avalanche

Hi Everyone,

Been incredibly busy with the new business ... head underwater the last few months. Finally clawing my way back to the surface!

Though not about metadata, read a great article a few days ago on a security expert named Kaminsky and how he discovered and helped fix a huge security hole in the domain name system. I liked this article on so many levels - I highly recommend you check it out.

In regard to metadata, most of my work these days are revolving around IBM's Infosphere product line. Especially there business glossary and it's ability to provide line-of-sight between your business end-users and your physical IT assets.

More soon ... - Mike

# posted by Michael C. Daconta @ 11/27/2008 10:23:00 AM

Sunday, June 08, 2008

Crazy busy!

Well, recently joined a startup as the Chief Technology Officer.
See our web page for details!

# posted by Michael C. Daconta @ 6/08/2008 12:02:00 PM

Sunday, February 17, 2008

[LTMP] Link Title Metadata Proposal

Link title metadata has become common place on news aggregrator/ranking sites like reddit, digg, delicious, and many others. A search today for "NSFW" on google returned 7,350,000 hits.
For those new to link metadata, NSFW is an acronym for "Not Safe For Work". This is Metadata in the title of a link to provide additional information about the linked resource for the benefit of readers. A tag of "NSFW" means that the end-user should NOT click on the link if they are at work or they could get in trouble. Adding such descriptive metadata to the title of a link is a great service to end-users. I applaud whoever started this trend. However, I have recently seen ambiguous and inconsistent variations on the use of link title metadata which means it may be time for a standard. For example, I recently saw [PIC] as a keyword; of course, PIC is not an acronym and right down lower on the same page was [Image] which is a much better description. Why do we need a standard? Browsers and sites (like reddit) may want to rely on this metadata for their customers - for example, a site may want to place all NSFW content in a separate section of their site. To do this, programs needs to be able to rely on the metadata ... fortunately, by following a few simple rules we can accomplish this.

LTMP Proposed Standard:

Enclose Link Title Metadata keywords in brackets; for example, [NSFW] or [SFW].
Link Title Metadata should either be at the beginning or end of a link title.
Reserve all capitals in a keyword for acronyms. Use a capitalized keyword for non-acronyms; for example, [Image] or [Video] or [Song].
The community should develop a list of well-known LTM keywords. I have also seen [Tutorial], [], [Graph], [Cartoon], [Non-Linkjacked], etc. Here is an initial list with definitions.
Link Title Metadata can have multiple words in a keyword; however, multiple keywords should be separated by a ',' (comma). For example, [NSFW, Image].
Plural keywords should be used when appropriate; for example, [Image] versus [Images].
[optional] If you have an authoritative source for the keyword, that can optionally be added with a "source" keyword followed by a ':' (colon) followed by the metadata keyword. An example of this would be "[W3C:Standard]OWL 2.0 is out!".

What do you think?

Labels: proposal metadata standard

# posted by Michael C. Daconta @ 2/17/2008 12:25:00 PM

Saturday, February 02, 2008

Speaking on Metadata at the DAMA conference

I just finished my presentation on "Effective Metadata Design" for the upcoming DAMA conference in San Diego. This is a presentation on the Appendix in my new book by outskirts press.

The presentation delves into metadata design by focusing on the seven specific techniques in metadata design: identification, static measurement, dynamic measurement, degree, categorization, relationships and commentary. Each is presented with examples on application of the techniques. Hope to see you there!

On another note, I will also be speaking at FOSE and a few other events in the works...

See you at the conferences!

Labels: metadata DAMA data management FOSE

# posted by Michael C. Daconta @ 2/02/2008 11:18:00 AM

Wednesday, November 21, 2007

Awarded United States Patent 7299408!

See the announcement on my website for details!
It is important to note that this directly relates to the concept of Information MVC and the case study in my new book: Information As Product.

# posted by Michael C. Daconta @ 11/21/2007 12:34:00 PM

Monday, November 12, 2007

Book Launch, GCN, DAMA and Cringley!

Lots of things going on lately! The top news would be the availability of my new book. You can order now at http://www.outskirtspress.com/daconta
So far, reviews our very positive with several government folks considering it a "must-read"!

Second, my latest GCN column is out and discusses the high-costs of ambiguity. In fact, it raises some issues relating to this blog - especially the ambiguity, for some, between metadata and data.

Third, I will be speaking at the next DAMA conference, on effective metadata design. This talk will cover all aspects of metadata raised in my new book and teach practitioners how to take their metadata management to the next level! Hope to see you there!

Finally, Robert Cringely has a new article out talking about a new opportunity for Google as a metadata provider.

It is good to live in such interesting times! Best wishes, - Mike

# posted by Michael C. Daconta @ 11/12/2007 05:02:00 PM

Tuesday, October 16, 2007

Reviewers Wanted for my New Book!

Hi Everyone,
Are you interested in reviewing my new book? If so, go to my website and contact me via the contact form for details. Please put in the subject line [IAP REVIEWER]. I will not ask, question or comment on the contents of your review in any manner - that is your unfettered opinion (and I wouldn't have it any other way). If you post your review online, I will send you an autographed copy of the book.

The book covers many of the concepts put forth on this blog. Specifically there are two key sections on metadata - chapter 3 on the Information Catalog and Appendix A on Effective Metadata design.

I will have many more posts and excerpts from the book in the following months.

Regards, - Mike

# posted by Michael C. Daconta @ 10/16/2007 09:18:00 PM

Sunday, August 19, 2007

Good Interview on XML and Data Management with Owen Ambur

An interview with Owen Ambur,a former senior architect in the Interior Department and leader of the XML community practice, entitled, The Data Landscape, is available on GCN. I worked with Owen on the data reference model and other XML related activities. Owen was a passionate XML advocate and we held many superb discussions on how to fix data management. We both argued strongly for a more concrete approach to the DRM with concrete XML schemas for the reporting of data assets. Good to hear that Owen is doing well and staying involved. It was a pleasure to work with him.

Labels: Owen Ambur XML data management

# posted by Michael C. Daconta @ 8/19/2007 08:18:00 AM

Thursday, August 16, 2007

Can Data Management help to Stop Fraud?

This article reports how a company fraudelently billed the pentagon over $900,000 for shipping. Sounds to me like poor data validation. If the billing system uses well-defined metadata fields, this type of anomaly should be easily caught. Seems like the pentagon should get some best practice from the fraud detection division of the credit card companies!

# posted by Michael C. Daconta @ 8/16/2007 07:34:00 PM

Sunday, June 24, 2007

New GCN Column

I have been swamped and remiss in updating this blog. My latest Government Computer News column is out. The column series is called "Reality Check" and I always strive to drill down beneath the symptoms to the root causes underneath (revealing the underlying "realities"). In this month's installment, I look for the killer app in SOA. Next month, I will be examining Rich Internet Applications (JavaFx, Flex, Silverlight, AJAX). Enjoy!

# posted by Michael C. Daconta @ 6/24/2007 06:23:00 AM

Monday, April 02, 2007

Step one is Transparency!

New article out over at GCN on transparency. It is critical to understand how without accessibility, metadata catalogs are worthless. So, transparency is first.
Best wishes, - Mike

# posted by Michael C. Daconta @ 4/02/2007 04:48:00 PM

Thursday, March 15, 2007

Google's Approach to Image Metadata

See how Google is trying to get better image search results: http://images.google.com/imagelabeler/.

This application essentially makes a "game" for you and a random internet partner to label random images to help improve the quality of Google's image search results.. Very interesting.

One of my friends said, "They've actually made assigning metadata to their images kind of fun!"

-Kevin T. Smith

# posted by Kevin T. Smith @ 3/15/2007 10:57:00 AM

Friday, February 16, 2007

Simpler, XHTML Friendly Metadata Format: RDFa

Good article on XML.com about RDFa. Finally, the benefit of triples in an easy-to-use syntax that we can leverage on the web. Can someone please tell Google right away that we need them to announce they will exploit a "Document-Date" RDFa field so we can do a google search and check a box for "sort by MOST RECENT", so we don't have to waste hours reading old documents when we know a more recent and relevant one exists!
That current pitfall just bit me when I was researching whether the SBA was going to raise the size limits for small businesses. I had heard there was some recent news on this but the google search was bringing up old stuff from 2004. Very frustrating! Page rank is not always the most relevant criteria for search... remember - it is all about the 5W's.

# posted by Michael C. Daconta @ 2/16/2007 12:19:00 PM

Wednesday, February 14, 2007

The Machine is Us/ing Us

Jim Feagans just shot me a URL for a neat video on the evolution of "smart data" on the web. It is very cool that an Anthropology teacher gets this! He comes to the conclusion that it is right and good (and beneficial) to teach the computer.
Thanks Jim!

Here is a transcript of the video:
Text is linear
Text is unlinear
Text is said to be unlinear
Text is often said to be unlinear
Text is unlinear when written on paper
Digital text is different.

Digital text is more flexible.
Digital text is moveable.
Digital text is above all…hyper.
Digital hypertext is above all…
hypertext is above all…
hypertext can link
hypertext can link
here
here
or here…
virtually anywhere
anywhere virtually
anywhere virtual
The WayBack Machine
http://yahoo.com
Take Me Back
Oct 17, 1996
Yahoo
View Source
Most early websites were written in HTML
HTML
HTML was designed to define the structure of a web document.
p is a structural element referring to “paragraph”
LI
LI is also a structural element referring to “List Item”
As HTML expanded, more elements were added.
Including stylistic elements like B for bold and I for italics
Suck elements defined how content would be formatted.
In other words, form and content became inseparable in HTML
Digital Text can do better.
Form and content can be separated.
http://www.cnn.com
RSS XML
View Source
XML was designed to do just that.
http://www.cnn.com/?eref=rss_topstories
same with
CNN.com
and
and virtually all other elements in this document.
They describe the content, not the form.
So the data can be exported,
free of formatting constraints.
Latest News
Anthro Blogs (124)
Savage Minds
8apps: Social Networking for Productive People
WORLD CHANGING ANOTHER WORLD IS HERE
Antrho Journals (124)
University of California Press
Journals Digital Publishing
Current Anthropology
AESonline.org
Google
With form separated from content, users did not need to know complicated code to upload content to the web,
I’m Feeling Lucky
Create Blog
Name Your Blog
Beyond Etext
http://beyondetext.blogspot.com
Choose a template
Your blog has been created!
Monday, January 29, 2007
Hello World!
POSTED BY PROFESSOR WESCH AT 8:14 PM 0 COMMENTS
There’s a blog born every half second
and it’s not just text…Search
YouTube
Broadcast Yourself
This is a video response to The Beauty of Being Human
flickr
Ahoy mwesch!
Upload Photos
Anthropology club
Created by you.
KSU Anthropology club
Club Photos
Google
XML facilitates automated data exchange
two sites can “mash” data together
flickr maps
I’m Feeling Lucky
Limelight
Fluffy and white
Brushy Creek
Tokyo Delve’s Sushi B..
Who will organize all of this data?
TAG
del.icio.us
digital ethnography hypermedia anthropology
save
Who will organize all of this data?
We will.
You will.
Google
XML + U & Me create a database-backed web
a database-backed web is different
the web is different
the web
we are the web
I’m Feeling Lucky
WIRED
We Are the Web
by Kevin Kelly
“When we post and then tag pictures
teaching the Machine to give names,
we are teaching the Machine.
Each time we forge a link,
we teach it an idea.
Think of the 100 billion times per day humans click on a Web page
teaching the Machine”
the Machine
Diigo
Highlight
Highlight and Sticky note
Mwesch’s private note
the machine is us
Digital text is no longer just linking information…
Hypertext is no longer just linking information…
The Web is no longer just linking information…
The Web is linking people…
Web 2.0 is linking people…
…people sharing, tracing, and collaborating…
Wikipedia
Web 2.0
edit this page
We’ll need to rethink a few things…
We’ll need to rethink copyright
We’ll need to rethink authorship
We’ll need to rethink identity
We’ll need to rethink ethics
We’ll need to rethink aesthetics
We’ll need to rethink rhetorics
We’ll need to rethink governance
We’ll need to rethink privacy
We’ll need to rethink commerce
We’ll need to rethink love
We’ll need to rethink family
We’ll need to rethink ourselves.
by Michael Wesch
Assistant Professor of Cultural Anthropology
Kansas State University

# posted by Michael C. Daconta @ 2/14/2007 01:00:00 PM

Wednesday, February 07, 2007

Metadata about Ontology Metadata

I am at the RSA conference this week, where I am speaking about SOA Security. In a few months, I am going to speak with Eric Monk at Semtech 2007 about techniques for associating security classification & other metadata with subject-predicate-objects in persistence stores without using reification. I have written about only one approach (and it is kind of a hack), so let me know what you think: http://home.comcast.net/~kevintsmith/resteasy.html

-Kevin T. Smith

# posted by Kevin T. Smith @ 2/07/2007 05:49:00 PM

Tuesday, January 02, 2007

The semantics of Microformats

Very interesting blog post by Alex Faaborg, a Mozilla engineer, on incorporating microformats into Mozilla. I have stressed focusing on the interrogatives in many conference talks and I encourage everyone to download the Operator plug in to try this out.

# posted by Michael C. Daconta @ 1/02/2007 02:00:00 PM

Sunday, December 17, 2006

Metadata Experts Wanted!

Hi Everyone,

I am busily working on my next book. In fact, I will be taking two weeks off around the holidays to make some serious headway. If you are interested in reviewing chapters for the book, email me at mdaconta AT oberonassociates.com (no mailto link to avoid spam robots). I plan on having the first batch of chapters ready for review in mid-january. I have changed the scope of this book several times and refined its approach and content. I am really happy with the book's new direction and am confident this will be an important book (note: if you think you know the subject matter - you probably don't because I have very recently changed the book title and outline). It is really shaping up nicely. I am only looking for a select few reviewers because I will be mailing out actual printed copies instead of electronic. If I know you - no need to send a resume - if we have never met and you want to be considered then email me your resume. I will get back to you after the holidays ...

Thanks, - Mike

# posted by Michael C. Daconta @ 12/17/2006 10:12:00 AM

Saturday, November 25, 2006

Awesome Semantic Web Application!!

Don't walk, run over to pandora.com for a glimpse at a robust music ontology in action! This website is a recommender system that uses a music ontology (smartly called the Music Genome Project) to recommend music similar to your known favorites by decomposing your preference into its component parts. I have been using this for a few days and my initial impression is that it works great! Kudos!
How many more "Genome" (i.e. Semantic Web) projects will we see in the coming years? If this is any indication, my guess is many, many ...

# posted by Michael C. Daconta @ 11/25/2006 09:09:00 PM

Torching Doctorow: Part 1

Cory Doctorow wrote an article in 2001 called "Metacrap" where he purports to expose seven fatal flaws to reaching a "metatopia" where there is a "world of exhaustive, reliable metadata."

Before we dig into his 7 "straw-men", let's examine a little metadata about Cory Doctorow to attempt to determine his qualifications for making such assertions. First, it is interesting that he does not post any qualifications or any references to back up any of his assertions in the article. We do not know how much, if any, metadata he has actually created. Secondly, wikipedia claims that he only has a high school degree. Third, he is certainly not a practicing IT professional, per his own website. Given the above three things, we must take his assertions with a big, grain of salt. However, given that there are a number of blog entries and links to his article it is worthwhile to at least examine his arguments. Of course, publicity is not any indicator of truth and I surmise that most links to his article are merely people commiserating the fact that metadata is hard to do right. Fortunately, there are those of us that still believe that "hard" does not equate to "wrong" and that this is a temporary state due to lack of expertise. More on that later ... let's get back to Doctorow.

Strawman #1. People Lie. Doctorow uses this to attack the reliability of metadata. His argument is that because metadata "lives in a competitive world", people will lie to gain advantage. Frankly, this is a ridiculous statement because all metadata does NOT live in a competitive world. In fact, the most important metadata, or enterprise metadata, will not live on the "wild, uncontrolled internet" but in the controlled, corporate intranet.
So, let's debunk this in a number of ways:
a. People without access to my metadata can lie all they want and it won't affect me.
b. People lie more when they can lie without attribution. That is why Wikipedia has so many problems (lack of attribution).
c. People lie about both data and metadata but Doctorow is not saying we should distrust the entire internet.
So, the real point here is that Non-attributed data and metadata, of any type, is untrustworthy. Fortunately, every good metadata development process includes attribution via governance so this is truly a tangential argument (at best).
d. Metadata is not the victim of people lying but a cure to that problem. For example, a metadata attribute of "reliability" which is used in a number of very credible organizations is quite effective in measuring the trustworthiness of the source of information. Of course, in some cases, capturing lineage can replace additional metadata attributes for judging reliability.

Daconta's Counterpoint #1. Reliable data and metadata is properly attributed.

Next time we will examine Straw-man #2...

# posted by Michael C. Daconta @ 11/25/2006 07:47:00 PM

Sunday, November 05, 2006

Geospatial Metadata

Ok, moving on from the application metadata example (by the way, that was very helpful in ferreting out the principles of metadata design) - I am now researching interrogative metadata (who, what, when, where, why). The basic principle here is that the interrogatives are powerful vertices or axes to describing our data assets. In that, I am starting with "where" and geospatial metadata. Of course, there is a wealth of information on creating and using geospatial metadata. The Federal Geospatial Data Committee has a good site here. Also, geospatial one stop has some good pages on metadata creation.
More on this soon...

# posted by Michael C. Daconta @ 11/05/2006 04:19:00 PM

Sunday, October 01, 2006

Application Metadata and an Ontology

In ferreting the facets of metadata design, I am using "Application Metadata" as a test case. Part of the metadata for an application would be one or more categorization schemes. Here is the start of a categorization scheme for the "purpose" attribute in our application metadata. This simple taxonomy/ontology (created using Protege from Stanford) was developed via a simple categorization exercise on the applications in my start menu on this laptop. The purpose of formalizing this taxonomy was to make it a "formal taxonomy" as expressed in my article on XML.com.
There are some weaknesses in this taxonomy like the "Utilities" class because it is a generic grouping where its subclasses could easily intersect with other classes (which would not work if you wanted each subclass to be disjoint).
This was a useful exercise that did help me as part of my research into metadata design and the facets of a good metadata record. If you have ideas on that let me know or post comments here. Examples of metadata facets are identification, static measurement, dynamic measurement, categorization, etc.
Let's dig in and work out the details of metadata design! More soon...

# posted by Michael C. Daconta @ 10/01/2006 09:53:00 PM