Thursday, November 27, 2008
Digging out from the avalanche
Been incredibly busy with the new business ... head underwater the last few months. Finally clawing my way back to the surface!
Though not about metadata, read a great article a few days ago on a security expert named Kaminsky and how he discovered and helped fix a huge security hole in the domain name system. I liked this article on so many levels - I highly recommend you check it out.
In regard to metadata, most of my work these days are revolving around IBM's Infosphere product line. Especially there business glossary and it's ability to provide line-of-sight between your business end-users and your physical IT assets.
More soon ... - Mike
Sunday, June 08, 2008
Crazy busy!
Sunday, February 17, 2008
[LTMP] Link Title Metadata Proposal
For those new to link metadata, NSFW is an acronym for "Not Safe For Work". This is Metadata in the title of a link to provide additional information about the linked resource for the benefit of readers. A tag of "NSFW" means that the end-user should NOT click on the link if they are at work or they could get in trouble. Adding such descriptive metadata to the title of a link is a great service to end-users. I applaud whoever started this trend. However, I have recently seen ambiguous and inconsistent variations on the use of link title metadata which means it may be time for a standard. For example, I recently saw [PIC] as a keyword; of course, PIC is not an acronym and right down lower on the same page was [Image] which is a much better description. Why do we need a standard? Browsers and sites (like reddit) may want to rely on this metadata for their customers - for example, a site may want to place all NSFW content in a separate section of their site. To do this, programs needs to be able to rely on the metadata ... fortunately, by following a few simple rules we can accomplish this.
LTMP Proposed Standard:
- Enclose Link Title Metadata keywords in brackets; for example, [NSFW] or [SFW].
- Link Title Metadata should either be at the beginning or end of a link title.
- Reserve all capitals in a keyword for acronyms. Use a capitalized keyword for non-acronyms; for example, [Image] or [Video] or [Song].
- The community should develop a list of well-known LTM keywords. I have also seen [Tutorial], [
], [Graph], [Cartoon], [Non-Linkjacked], etc. Here is an initial list with definitions. - Link Title Metadata can have multiple words in a keyword; however, multiple keywords should be separated by a ',' (comma). For example, [NSFW, Image].
- Plural keywords should be used when appropriate; for example, [Image] versus [Images].
- [optional] If you have an authoritative source for the keyword, that can optionally be added with a "source" keyword followed by a ':' (colon) followed by the metadata keyword. An example of this would be "[W3C:Standard]OWL 2.0 is out!".
Labels: proposal metadata standard
Saturday, February 02, 2008
Speaking on Metadata at the DAMA conference
Labels: metadata DAMA data management FOSE
Wednesday, November 21, 2007
Awarded United States Patent 7299408!
It is important to note that this directly relates to the concept of Information MVC and the case study in my new book: Information As Product.
Monday, November 12, 2007
Book Launch, GCN, DAMA and Cringley!
So far, reviews our very positive with several government folks considering it a "must-read"!
Second, my latest GCN column is out and discusses the high-costs of ambiguity. In fact, it raises some issues relating to this blog - especially the ambiguity, for some, between metadata and data.
Third, I will be speaking at the next DAMA conference, on effective metadata design. This talk will cover all aspects of metadata raised in my new book and teach practitioners how to take their metadata management to the next level! Hope to see you there!
Finally, Robert Cringely has a new article out talking about a new opportunity for Google as a metadata provider.
It is good to live in such interesting times! Best wishes, - Mike
Tuesday, October 16, 2007
Reviewers Wanted for my New Book!
Are you interested in reviewing my new book? If so, go to my website and contact me via the contact form for details. Please put in the subject line [IAP REVIEWER]. I will not ask, question or comment on the contents of your review in any manner - that is your unfettered opinion (and I wouldn't have it any other way). If you post your review online, I will send you an autographed copy of the book.
The book covers many of the concepts put forth on this blog. Specifically there are two key sections on metadata - chapter 3 on the Information Catalog and Appendix A on Effective Metadata design.
I will have many more posts and excerpts from the book in the following months.
Regards, - Mike
Sunday, August 19, 2007
Good Interview on XML and Data Management with Owen Ambur
Labels: Owen Ambur XML data management
Thursday, August 16, 2007
Can Data Management help to Stop Fraud?
Sunday, June 24, 2007
New GCN Column
Monday, April 02, 2007
Step one is Transparency!
Best wishes, - Mike
Thursday, March 15, 2007
Google's Approach to Image Metadata
This application essentially makes a "game" for you and a random internet partner to label random images to help improve the quality of Google's image search results.. Very interesting.
One of my friends said, "They've actually made assigning metadata to their images kind of fun!"
-Kevin T. Smith
Friday, February 16, 2007
Simpler, XHTML Friendly Metadata Format: RDFa
That current pitfall just bit me when I was researching whether the SBA was going to raise the size limits for small businesses. I had heard there was some recent news on this but the google search was bringing up old stuff from 2004. Very frustrating! Page rank is not always the most relevant criteria for search... remember - it is all about the 5W's.
Wednesday, February 14, 2007
The Machine is Us/ing Us
Thanks Jim!
Here is a transcript of the video:
Text is linear
Text is unlinear
Text is said to be unlinear
Text is often said to be unlinear
Text is unlinear when written on paper
Digital text is different.
Digital text is more flexible.
Digital text is moveable.
Digital text is above all…hyper.
Digital hypertext is above all…
hypertext is above all…
hypertext can link
hypertext can link
here
here
or here…
virtually anywhere
anywhere virtually
anywhere virtual
The WayBack Machine
http://yahoo.com
Take Me Back
Oct 17, 1996
Yahoo
View Source
Most early websites were written in HTML
HTML
HTML was designed to define the structure of a web document.
p is a structural element referring to “paragraph”
LI
LI is also a structural element referring to “List Item”
As HTML expanded, more elements were added.
Including stylistic elements like B for bold and I for italics
Suck elements defined how content would be formatted.
In other words, form and content became inseparable in HTML
Digital Text can do better.
Form and content can be separated.
http://www.cnn.com
RSS XML
View Source
XML was designed to do just that.
http://www.cnn.com/?eref=rss_topstories
same with
CNN.com
and
and virtually all other elements in this document.
They describe the content, not the form.
So the data can be exported,
free of formatting constraints.
Latest News
Anthro Blogs (124)
Savage Minds
8apps: Social Networking for Productive People
WORLD CHANGING ANOTHER WORLD IS HERE
Antrho Journals (124)
University of California Press
Journals Digital Publishing
Current Anthropology
AESonline.org
With form separated from content, users did not need to know complicated code to upload content to the web,
I’m Feeling Lucky
Create Blog
Name Your Blog
Beyond Etext
http://beyondetext.blogspot.com
Choose a template
Your blog has been created!
Monday, January 29, 2007
Hello World!
POSTED BY PROFESSOR WESCH AT 8:14 PM 0 COMMENTS
There’s a blog born every half second
and it’s not just text…Search
YouTube
Broadcast Yourself
This is a video response to The Beauty of Being Human
flickr
Ahoy mwesch!
Upload Photos
Anthropology club
Created by you.
KSU Anthropology club
Club Photos
XML facilitates automated data exchange
two sites can “mash” data together
flickr maps
I’m Feeling Lucky
Limelight
Fluffy and white
Brushy Creek
Tokyo Delve’s Sushi B..
Who will organize all of this data?
TAG
del.icio.us
digital ethnography hypermedia anthropology
save
Who will organize all of this data?
We will.
You will.
XML + U & Me create a database-backed web
a database-backed web is different
the web is different
the web
we are the web
I’m Feeling Lucky
WIRED
We Are the Web
by Kevin Kelly
“When we post and then tag pictures
teaching the Machine to give names,
we are teaching the Machine.
Each time we forge a link,
we teach it an idea.
Think of the 100 billion times per day humans click on a Web page
teaching the Machine”
the Machine
Diigo
Highlight
Highlight and Sticky note
Mwesch’s private note
the machine is us
Digital text is no longer just linking information…
Hypertext is no longer just linking information…
The Web is no longer just linking information…
The Web is linking people…
Web 2.0 is linking people…
…people sharing, tracing, and collaborating…
Wikipedia
Web 2.0
edit this page
We’ll need to rethink a few things…
We’ll need to rethink copyright
We’ll need to rethink authorship
We’ll need to rethink identity
We’ll need to rethink ethics
We’ll need to rethink aesthetics
We’ll need to rethink rhetorics
We’ll need to rethink governance
We’ll need to rethink privacy
We’ll need to rethink commerce
We’ll need to rethink love
We’ll need to rethink family
We’ll need to rethink ourselves.
by Michael Wesch
Assistant Professor of Cultural Anthropology
Kansas State University
Wednesday, February 07, 2007
Metadata about Ontology Metadata
-Kevin T. Smith
Tuesday, January 02, 2007
The semantics of Microformats
Sunday, December 17, 2006
Metadata Experts Wanted!
I am busily working on my next book. In fact, I will be taking two weeks off around the holidays to make some serious headway. If you are interested in reviewing chapters for the book, email me at mdaconta AT oberonassociates.com (no mailto link to avoid spam robots). I plan on having the first batch of chapters ready for review in mid-january. I have changed the scope of this book several times and refined its approach and content. I am really happy with the book's new direction and am confident this will be an important book (note: if you think you know the subject matter - you probably don't because I have very recently changed the book title and outline). It is really shaping up nicely. I am only looking for a select few reviewers because I will be mailing out actual printed copies instead of electronic. If I know you - no need to send a resume - if we have never met and you want to be considered then email me your resume. I will get back to you after the holidays ...
Thanks, - Mike
Saturday, November 25, 2006
Awesome Semantic Web Application!!
How many more "Genome" (i.e. Semantic Web) projects will we see in the coming years? If this is any indication, my guess is many, many ...
Torching Doctorow: Part 1
Before we dig into his 7 "straw-men", let's examine a little metadata about Cory Doctorow to attempt to determine his qualifications for making such assertions. First, it is interesting that he does not post any qualifications or any references to back up any of his assertions in the article. We do not know how much, if any, metadata he has actually created. Secondly, wikipedia claims that he only has a high school degree. Third, he is certainly not a practicing IT professional, per his own website. Given the above three things, we must take his assertions with a big, grain of salt. However, given that there are a number of blog entries and links to his article it is worthwhile to at least examine his arguments. Of course, publicity is not any indicator of truth and I surmise that most links to his article are merely people commiserating the fact that metadata is hard to do right. Fortunately, there are those of us that still believe that "hard" does not equate to "wrong" and that this is a temporary state due to lack of expertise. More on that later ... let's get back to Doctorow.
Strawman #1. People Lie. Doctorow uses this to attack the reliability of metadata. His argument is that because metadata "lives in a competitive world", people will lie to gain advantage. Frankly, this is a ridiculous statement because all metadata does NOT live in a competitive world. In fact, the most important metadata, or enterprise metadata, will not live on the "wild, uncontrolled internet" but in the controlled, corporate intranet.
So, let's debunk this in a number of ways:
a. People without access to my metadata can lie all they want and it won't affect me.
b. People lie more when they can lie without attribution. That is why Wikipedia has so many problems (lack of attribution).
c. People lie about both data and metadata but Doctorow is not saying we should distrust the entire internet.
So, the real point here is that Non-attributed data and metadata, of any type, is untrustworthy. Fortunately, every good metadata development process includes attribution via governance so this is truly a tangential argument (at best).
d. Metadata is not the victim of people lying but a cure to that problem. For example, a metadata attribute of "reliability" which is used in a number of very credible organizations is quite effective in measuring the trustworthiness of the source of information. Of course, in some cases, capturing lineage can replace additional metadata attributes for judging reliability.
Daconta's Counterpoint #1. Reliable data and metadata is properly attributed.
Next time we will examine Straw-man #2...
Sunday, November 05, 2006
Geospatial Metadata
Sunday, October 01, 2006
Application Metadata and an Ontology

In ferreting the facets of metadata design, I am using "Application Metadata" as a test case. Part of the metadata for an application would be one or more categorization schemes. Here is the start of a categorization scheme for the "purpose" attribute in our application metadata. This simple taxonomy/ontology (created using Protege from Stanford) was developed via a simple categorization exercise on the applications in my start menu on this laptop. The purpose of formalizing this taxonomy was to make it a "formal taxonomy" as expressed in my article on XML.com.
There are some weaknesses in this taxonomy like the "Utilities" class because it is a generic grouping where its subclasses could easily intersect with other classes (which would not work if you wanted each subclass to be disjoint).
This was a useful exercise that did help me as part of my research into metadata design and the facets of a good metadata record. If you have ideas on that let me know or post comments here. Examples of metadata facets are identification, static measurement, dynamic measurement, categorization, etc.
Let's dig in and work out the details of metadata design! More soon...
Saturday, September 16, 2006
Solving the All-Data Problem #1: The Data Optimization Pyramid

The figure on the right depicts the "data optimization pyramid". This is the key scoping strategy that all organizations should use to apply reasonable management techniques to their data by first realizing that "all data is not equal". This means that you do not apply management techniques to all your data because not all of your data is meaningful or relevant. The key is developing an understanding and management strategy to allow relevant information to rise to the top. This may mean that you apply some brute force techniques to the Unmanaged data (like unstructured data) in order to assist you in the "bubbling up" process. But on the other side of the coin, labor-intensive tasks like tagging are reserved for a smaller subset of the data. Ok, you should have noticed that the pyramid is not complete ... here is where I need your help to find a good name for the top tier. At the top is critical data that you want to automate rules against, require a guaranteed level of fidelity or is highly relevant to a particular high-value ad-hoc community of interest. So, what do you think that top-tier should be called?
Some options:
- Augmented Data
- Refined Data
- Formal Data
- Critical Data
Comments very welcome ...