Why OWL triples matter
The Portable Ontology Revolution in Domain Knowledge Representation

Stu Baurmann - July 17, 2005

As provocation, I'll mention my opinion that movement towards representing knowledge with triples will
turn out to be a particularly important trend in the modern history of computing, as demonstrated by the
current emergence into the mainstream of RDF and OWL-enabled knowledge infrastructure such as
Protege, SWOOP, Jena, RDFGateway, Kowari, and many other tools, both open-source and commercial.

Why is "knowledge" on the large scale really happening this time? Isn't this the same rosy
immediate future we've heard about for 50+ years in both the popular culture and theoretical
computing, "artificial intelligence" in the form of emergently intelligent, sentient computers that
have personalities and want to be your friend and/or destroy all humans, blah blah blah?

Yes and no. I think what has changed is that a convergence of various storylines is leading
us into an age where we are able to be realistic and practical about some elementary forms
of knowledge encoding which strike a balance between power of expression and practical,
managable applicability. One storyline regards our continued improvement in understanding
of domain-specific knowledge representation techniques applied as part of a value-generating
process, when approached cautiously and with metrics in hand. Another storyline is the
day-by-day improvement in quality and number of integratable web resources such as Wikipedia,
Google, Amazon, and countless others, with their broad variety of business models. A third, parallel
storyline concerns the general maturation of what we call "information technology", both within the
profession and in its role as a principal segment of the North American economy and culture.
Neal Postman would probably say that information technology has been the prime driver of the
world economy and culture since at least Gutenberg, but that's another story (any Posties out there?).

Okay, so given those storylines as context, what I'm saying is that we're currently graduating a
magnitude in our ability to represent and share human knowledge, which is progress
in our power as a society (perhaps not exactly what our "society" most needs right now, but
technology is currently on it's own calendar to a great extent) that is independent of our computers'
capability for gee-whowzers so-called Artificial Intelligence features. Confusing these two subjects -
"Knowledge Management" and "Artificial Intelligence" - is a source of great current misunderstanding
in our profession's relationship to the econoculture at large.

AI, that is computer inference of new information based on inputs, is a nifty idea that inflames the
imagination to the point of triggering various positive/negative fantasies. That said, limited AI features
can be implemented today, with appropriate investment in a limited knowledge-engineering domain.
But regardless of what one thinks about AI, I think we should carefully consider the importance
of human knowledge representation to solving our currently gigantic problems with complexity,
accuracy, and productivity of software applications for organized human activities. I say this
in part out of painful personal experience as a software+process consultant for many years.
The pervasive task in my consulting career has been helping organizations overcome bottlenecks in
the understanding of themselves - in every case this turns out to dwarf the complexity of the particular
technical problem my client sub-unit is attempting to address. I am writing about this now because
I feel the need to declare for the benefit of my naturally suspicious peers:

This stuff works

...having in recent years led the adoption of semantic technology for a few large corporate projects,
and having thereby confirmed in my own mind that this toolset and methodology indicate a promising direction for practical future work in addressing a large slate of thorny business-of-information problems,
though certainly not a panacea in itself. I see this semantic triples stuff as not just fun, interesting,
powerful technology, but as an important enabler for structural reform of many organizational processes requiring flexible handling of a diversity of circumstances,

OK, so why am I hammering on this point? Data warehouses and business intelligence dashboards
and expert systems and whatnot have been around for awhile, right? What's changing now? Why all
the sweet-smelling hot laundry about Knowledge Representation? Who am I shilling for? Why
LogicU, now, Mr Westerner Guy?

Well, the short answer is that triples, 3-ary relationships, turn out to hit a kind of sweet spot in the
squishy underbelly of today's organizational knowledge beast. The importance or goodness of
RDF as a technology and knowledge model can be pontificated about theoretically on any
side you like, but my point is that my experience indicates adoption of triple-store representations
represents a close-to-optimal extension of the presently conventional software construction
methodologies. This optimality (within certain assumptions) arises because this adoption is both:

  1. A large enough change in expressiveness to permit a true quantum improvement in system capability and quality (this I've seen firsthand, perhaps it's easier to show than tell)
  2. A small enough change in structure to be digestible culturally within the profession and in our user community (witness W3 semantic-web standards activity)

Either of those points can be challenged, and I'm interested to see which is more contentious among
my peers, so have at it!

Alright, so I've told you what my point is. Now, to flesh it out, let's go back and understand triples a
little better. What's so great about 'em compared to the way most of us work with our information
sets today?

Well, if you've studied a bit of algebraic topology (it's OK if you haven't, ask me or someone else to
explain sometime, or ask wikipedia), you know that the dimension of a space is a crucial parameter in
determining what can be represented in it. For a long time, we've gotten by in our imperative computer
programs with an abundance of 2-ary relationships, that is, variations on the name-value pair. If you've
bothered to read this far, I'm sure you know what I am talking about, and you are well familiar with
the abundance of NV-pair-like-things in the software you have worked on. But wait, what about
abstract datatypes, objects, and relational databases, aren't those n-ary relationships? Well, yes
and no. Yes in absolute, theoretical, structural terms, but No in terms of generally available
expressive semantics available to the programmer.

Huh? Beespresso Demantics? What you talkin bout, Hot Laundry Man? OK, OK, settle down, kiddos.
Think about what an object instance or an SQL row represents in its fields or columns: it is a set of
name-value pairs. Each name is the name of a field or column, and each value is either a primitive or a
compound construct (sub-object, array, etc) which is itself addressable as name-value pairs.
This may not be the representation in memory, but that is irrelevant to my point, which is that
a conventional imperative program must be described in terms of navigation of contained name-value
pairs in order to execute and do useful operations.

More importantly, the program is always constructed in terms of some expectation about the
types of the value of most of the pairs. Some variation in this type is generally permitted
(i.e. with object-oriented inheritance, or a field/column called "type", etc.), and it is management
of this variation across the program's subsystems and workflow and GUI that provides the
sustenance of the modern programmer (or UML modeler). However, it is only in advanced,
more research-oriented environments that a programmer has full access to the typing model
of all information at runtime. For example, Java objects and C++ objects cannot easily "change
type" in a running program. On the other hand, with today's OWL technology, it is in fact
possible to derive/calculate the type of an acquired piece of information, while remaining within a
JVM running on Linux or a Windows/C#/.NET environment or similar robust and conventionally
deployed software platforms. This dynamically computed type can then be used to drive behavior
as declaratively configured by mortal engineers using standards-based technology (not mysteriously
evoked by that one brilliant programmer who really understands your magical code-generator!).

The "triples + kernel" approach is simply a generalization of imperative techniques so that the
useful behaviour currently described by program instructions (or UML sequence diagrams) is
instead provided by a small generic kernel which is configured by a knowledge model containing
assertions about relationships between entities. This knowledge model is managable with techniques
that, with proper architecture and planning, are magnitudes more efficient in expressing and confirming
human intentions than are sets of UML models and Java/C# code with their attendant RDBMS
configuration and so on.

There are many qualifications applicable to these statements, but I hope you'll see my basic point.
BTW,the use of the term "kernel" implies a useful analogy with operating system construction, which I
hope is apparent. Let me know if it ain't.

Recapitulating, the knowledge oriented approach to software system development is an application of
mathematical insight informed by understanding of human engineering dynamics. This understanding
has been gleaned from observation and participation in the current heavy-treading industrialized
approach to software definition, which involves establishment of numerous choke-points for expression
of intent, as well-motivated defense against the horrific risks that software projects have always
faced: incomplete and inconsistent requirements, inadequate testing, etc.

So, to boil it down: By committing to an architecture based on a small, simple kernel of operations
driven by a human-knowledge-containing triple store, we can build software systems that are more
accurate reflections of our needs and are more automatically testable. These two improvements
together give a quantum improvement in system quality as perceived by the end-users.
I further submit that the extent to which these improvements are undertaken as evolutionary or
revolutionary changes in an organization can be shaped by those managing the the knowledge-enabled
oriented project, depending on their priorities.

Now, to be clear about what is new-ish here, a general n-ary expressive power is available implicitly
in the SQL information model, no doubt about it. In fact, comprehending the role of SQL in modern
systems development is a key to understanding of where we're going with this semantic technology.

When you are filtering sets of rows by using SQL WHERE clauses, you are, in fact, working with
an n-ary tuple model that is properly abstracted and accessible. Projecting this power forward
towards the user through your layers of MVC and ASPs/JSPs/EJBs and so forth is what much of
current system engineering is about, yes? My point is that the triples approach is an approriate
modern grounding formalism for broad system engineering efforts beyond the transactional data
store, and can in fact be implemented in orthogonal harmony with the SQL approach. The parameters
of coexistence can be understood this way: triples can be formulated as simultaneously a limitation (in
dimension) and extension (in practical expressive power) of the modern conventional SQL-grounded
approach.

The limitation is this: Since all the triples in an RDF model can easily be stored in a single SQL table
with 3 columns, then all RDF operations are inherently emulatable as SQL operations, so there is a
kind of inherent backward-compatibility here, and we can see RDF as simply a subset of things SQL
already does just fine. But the extension comes in here: We're re-conceptualizing the form of the
information representation so that all relevant meta-data is now expressible within the same single
3-column table as the data-data, and both are mutable by the same core model read/write operations.

That's it, folks, That's the key! You haveta get that last point in order to understand what I'm talking
about, here. In a conventional SQL approach, we start with an ERD that identified all the entities
in your model, and we implement that model in a bunch of RDBMS tables, one for each entity type.
Now, in any particular RDBMS you are able to work with metadata by querying system tables and so
on. Thus we can see a RDBMS platform as being equivalent or even a superset of a simple
triple-store system, which makes sense since we know that an RDBMS is functionally sufficient
to implement just about any abstract model understandable by more than a handful of math-lovers.
However, now we must recognize some practical concerns: An RDBMS is generally a self-contained
entity that must be interfaced with rather than used as a total solution platform, unless you completely
commit to a particular solution framework (even your own) and thereby sacrifice portability and
interoperability.

The point of triple-based semantic technology is that it allows you to move knowledge models around
freely between programs and platforms, work within regular standards-based XML-oriented web software
environments, and so forth. Thus you get the power of working with type-flexible and mathematically
expressive 3-ary relationships, without having to "hit the database" or " program in the database" every
time you need this power. This is perhaps a subtle and highly-qualified point, but it turns out to
have huge potential ramifications in both development efficiency and system quality.

This site contents © 2001-2005 by Scrutable Systems, Inc. Please send all questions and comments to xmlexpertise AT scrutable .com