You are here: Home » productivity » The Problem of Canonicity: Which document version is the “most correct?”

The Problem of Canonicity: Which document version is the “most correct?”

by David M. Doolin, PhD on July 7, 2009

Is this the right version?

Is this the right version?

So, who has the most recent copy of that contract everyone is supposed to sign? Wait… is the most recent copy the correct copy?

How would we know?

All businesses require managing the paperwork. It’s pure overhead, a cost center, a necessary evil. Big businesses have big solutions: expensive consultants are hired, who recommend expensive software services requiring expensive training by (of course) the expensive consultants. The cost is amortized over a large product line, or may be rolled into acceptable indirect costs when on government contracts.*

Small businesses have no such luxury. Dealing with overhead costs small business people real money, real fast…

…and one source of overhead costs is dealing with documents.

It’s 6:13 am July 6 2008 and I’m sitting at a picnic table in Fallen Leaf campground. DZ is still sleeping. It’s chilly. It’s clear skies, but the campground is buried in pines. The sun is coming in between the tree trunks and I’m shuffling around the picnic table attempting to stay within a narrow beam of sunlight.

We’re going to walk to the top of Horsetail Falls later, which is a story for a different day (and a different venue). As I brew my morning (9 shot) cuppa I ponder the mystery of attempting to keep track of documents in the computer age.

No, really I am.

I’m writing a list of bullet points outlining the problem of determining which documents in a collection of like documents should be considered canonical… on the back page of an article I was supposed to be reviewing for Numerical and Analytical Methods in Geomechanics. (I did the review at some point, and I’m sure the paper was published, but I don’t recall which one it was). DZ doesn’t really have a problem with this, because if I wasn’t geekin’ out… I’d have her out of the tent and on the trail waaaay before her preferred 10 am metamorphosis from sleeping bag slug to campground queen.

What is a canonical document?

Succinctly, a “canonical” document is the ultimate reference source, the source – by definition – where all the copies are supposed to come from.

Determining which document should be used as the reference source becomes a problem in an environment where documents may be copied… and perfect copies made of those copies… which can later be modified.

Monks transcribing by hand are an early example of this.

Xerox copy machines make the problem a little bit worse. Back in the typewriter days, it was always possible to tell the difference between typed copy and “copied” copy. Even when the originals are printed out of a laser printer, the copies generally have that “copied” look about them.

The electronic age geometrically increases the complexity. Copies can “look perfect” yet have significant differences in the text.

Computer solves old problems, creates new problems

When electronic documents are passed around, and multiple people have the authority to make changes to the documents, the problem of determining which version is the correct version becomes very important.

For example, consider the history of this blog post. It started as: 1. bullet points on the back of some scratch paper, 2. to notes in personal wiki, to 3. this blog post.

Which is canonical? That depends on the context, time line of document history, canonicity hierarchy, notes, wiki, svn repo, publications, etc.

Say what?

That’s a lot of stuff, what’s it all mean? And how to manage all this complexity?

Good questions.

Here’s how I do it:

  • Allow fuzziness. If it doesn’t really matter which of several very similar versions are correct, I don’t worry about it. I’ll use the one on hand if it works well enough.
  • Archive the document when possible, using software such as Subversion.
  • During idea generation and gestation I don’t worry too much about which documents are canonical, as long as references and credits are maintained (when necessary). Good ideas have a certain “stickiness,” and if they don’t stick, they probably weren’t very good anyway.
  • In rare cases where it matters, scans and screenshots can be taken and stored.

Establishing a canonical document

Yet another strategy acknowledges the essential impossibility of maintaining perfectly canonical documentation for all documents. Using this strategy, some classes of documents are used on a “good enough” basis.

One way to render a document canonical is to create a pdf file, and archive that pdf file in a repository such that it’s dated and it’s not editable. This is especially good for documents that really matter, like contracts and non-disclosure agreements. Each signatory can electronically sign a mutually agree upon version, which is then stored electronically in a document repository. (You print out a copy for your own records of course, just in case.)

Another way to render a document canonical is to publish it in a public venue such as a scientific journal, a magazine or as a blog post. It’s true that blog posts can be revised, but it’s also true that revisions can be tracked, with a policy of treating the current published revision as the canonical version.

Keeping documents current and correct is one problem, getting rid of them is another!

Getting rid of documents

In the first PC age (pre-computer), it was easy to get rid any document: just burn it.

Now, even if a document is “deleted,” it may still lurk on a file system. Or it may be attached to a dozen emails. Or whatever.

Since burning it won’t work anymore, another method for “destroying” documents needs to be developed. Here’s a better way to think: document destruction is now a process, not an singular act. Unnecessary, out of date and irrelevant documents can be “unwound,” or wound down by simply ignoring them and letting bit rot take it’s course.

Dealing with large amounts of documents

Short answer: hire a librarian when your documentation requirements become complex. Think of it in the same way you would hire an accountant once your business structure gets too complicated for you and your bookkeeper.

My tools and techniques

I have two main techniques for managing documents so that I know which one is canonical. The first is technology-based, the second is psychology-based:

  1. Archiving software
  2. Bit literacy

Archiving software is old news in the programming community, where revision control is critical to software development success in teams larger than 1 member. Personally, I consider revision control critical to my success even when I’m working alone!

Bit literacy is a way of encoding meta-data at every possible level into your information. For example, let’s say you store electronic invoices from subcontractors. It pays you back to develop a systematic method of naming and storing these invoices, such that you could tell someone how to find anyone of them… over the phone.

Electronic documents become invisible

Strangely enough, losing documents can be easier than ever. If you don’t print it out, and don’t have archived on a 3rd party system (e.g., gmail, etc.), your electronic document is essentially invisible in way very different from having it stashed in a box in the attic. When documents are electronic-only, you run the risk of a document simply evaporating, as if it never existed. Perhaps your hard drive crashes. Perhaps you forget to copy everything from an old computer to a new computer. Boom. It’s gone!

Here’s a couple of ways to increase visibility of your documents:

  1. Create one or two main document repositories (i.e., folders) on your computer for handling “miscellaneous” documents, then index ALL of your documents locally (that is, on your computer), and make a habit of searching your local hard drive(s) for information as well as searching the internet. You may find what you need buried in a PDF file or a Microsoft Word document you forgot you have. The downside: indexing takes up a lot of hard drive space.
  2. Use an repository and archiving system as mentioned before, such as Subversion. As long as you keep track of the archive repository, you will have a copy. NOTE: I’ve lost repositories by lack on maintenance and changing hosts!

Disclaimer: I’m flying blind on this topic, writing from my personal experience managing electronic documents for almost 20 years. Undoubtedly, some university probably offers a degree in this… but what you just read was practical application for small business.


*I Am Not An Accountant. If you’re billing government work, make sure to retain an accountant familiar with government accounting procedures.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • Sphinn
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • TwitThis

{ 1 trackback }

Ultra-Fast Startup Technology: Bootstrap with free or cheap web applications | There Is NO Box
August 7, 2009 at 12:33 pm

{ 2 comments… read them below or add one }

Sue September 24, 2009 at 1:46 pm

Definitely a worthy subject. The organization of the post takes on the same “disjointedness” as our document systems.

I like blog posts that only have one set of numbers. “3 ways to…” “5
reasons….”

What you have is several sets of numbers or bullets (on recollection,
without going back and reading a second time). And that already has my over-logged mind confused.

What I might have liked:
- a micro-tutorial on how to name your documents, or bit literacy
- a case study that I can relate to that encompasses all my problems (i have current client stuff on my hard drive and Basecamp, archived client stuff on my server, and random notes and thoughts in Evernote). I _think_ I have it figured out but who knows…

Reading about this reminds me of how much more I could be doing. And all those old archives that will never be organized just sitting in a clump. Makes me feel overwhelmed.

Reply

David M. Doolin, PhD September 24, 2009 at 10:17 pm

@Sue – This was not a particularly easy article to write!

Based on your comment, I can see at least a couple more articles based on this article, and probably a rewrite of this one as well. Might be a couple of weeks before I get around to it though.
David M. Doolin, PhD´s last blog ..Practical WordPress Tip #14: Manage a huge Draft Queue for Daily blog posting My ComLuv Profile

Reply

Leave a Comment

CommentLuv Enabled