Showing posts with label information. Show all posts
Showing posts with label information. Show all posts

Thursday, October 05, 2006

Tagging

Tagging is a very common feature in various database type applications nowadays. It's very tempting to try and use it as a hierarchical or disjoint categorisation tool, but this is obviously flawed! I have tried to use tags as such, but it doesn't work well, and breaks the real idea of what tagging is about.
Tagging is supposed to be haphazard and instinctive. If you scatter many tags everywhere, and use the tags, you will naturally settle into a vocabulary that unlocks a lot of extra information from your database, whether you're using a personal or a social tagging system.
A small number of well chosen tags that has the effect of causing a hierarchy or a disjoint categorisation reduces your flexibility in choosing tag names in the future, and makes things needlessly constrained.
As concerns social categorising systems, maybe we need parallel systems: tagging, hierarchies, sets and more. The more ways we have to link up data, the more useful it is I would say!

Wednesday, August 16, 2006

decaying filesystems

Wikipedia is a big collection of interconnected articles, each with its own edit history.
If we view each article+history as one object, then it will come as no surprise that links between articles link to the most recent version of the article.
But if instead we view each version of an article as a separate article, a different view comes out. Instead of wikipedia being a big collection of interconnected articles, we have a big collection of interconnected stacks of articles, where each stack represents the article and its previous versions.
Now why do links between articles automatically point to the 'top of the stack'? The article may be changed out of all recognition from what it was further down the stack (earlier in its history).
In fact, we arrive right at the paradox of the heap: when does an article undergoing many small incremental changes actually cross the boundary between being a modification of its earlier self to being a totally new article?
The answer is of course, that it doesn't. The boundary is entirely artificial, and almost entirely not useful.
So this is the fix: wikipedia article links should link through to the version of the article that is relevant to the text linking to it. Intuitive no?
This leads to some nice behaviours: because links can be redirected to different versions of an article you know you're pointing at what you intended to point at. Also, there is more emphasis on the age and stability of the information you're viewing.
But perhaps it would be tedious to have to keep updating links as articles improved over time. Perhaps there could be mechanisms that tracked people's browsing paths and updated links automatically. Or perhaps it would form the basis of a new recommendation system: major edits would gain approval from the community by getting linked to.
This system also suggests another improvement: branching articles. Each modification to an article is actually a branch. Vandalized branches would soon die, while community-approved branches would blossom.
You could even have a system whereby short branches with low link counts (or a high proportion of links from ancient but successful branches, representing abandoned links) could be migrated to lower quality media, or even disposed of: a sort of garbage collection for high-level human-readable information.
Branches are also excellent in that they solve the article renaming, moving, merging and splitting problems at a stroke: because the mechanisms to redirect links quickly are already in place, it is comparatively cheap to perform these operations.
The possibilities are endless. And it would be a job to get right, but it's something we could benefit from a lot I think.

Monday, September 19, 2005

letting things come to me

Since I've become more aware of the Web2.0 revolution, I've noticed more and more web technologies that I'd been thinking about years ago rising to the surface. A key example was an idea I had about online libraries where you list your books, and then other people can browse your library and if appropriate make requests to exchange or borrow books. It would be like an enormous communal, virtual bookshelf, except you'd get to read real books instead of reading off a computer screen (which despite what anyone says is still less comfortable than reading a book! at least to a majority of users...).
Well, now we have Listal, LibraryThing and AllConsuming. I'm not sure who owns them and whether I really want to commit my data to them, but the services are definitely there and doing what I thought such services should do years ago.
Which brings me to my main point. With services like these, I shouldn't have to think about committing my data. My data should reside where I want it to, and I should allow these services access to my data on my terms and conditions.
We need some sort of a platform for maintaining information, and then transforming it and submitting it. Preferably in a relatively extensible and/or standardized way. XML and XSLT style technologies seem to be screaming out to be used in this sort of position.
In addition, we need some sort of voluntary code of conduct whereby we can be reasonably assured that we can reliably dictate the terms under which such services can use our data. Maybe some open source datakeeper software modelled on recent digital rights management advances, so that as well as records companies being able to control our rights on the music we license from them, we can also revoke other organisation's rights on the data they license from us.
Digital rights management isn't necessarily a bad thing, but biased towards the goals of the powerful it is clearly not a good thing.
So the data landscape of the future? Data residing in multiple incarnations on various storage devices across the world, controlled by open source datakeeper software allowing only authorised people to access it, and transform it, using flexible tools. The owners of the data - you and me in addition to the organisations and corporations - empowered by our data's newfound mobility and flexibility.

Saturday, September 03, 2005

some musings on data and interpretation

Some ideas that came out easily, on the 9th of June 2005. It feels like they're going somewhere, but not without some thought, and probably a lot of maths and programming.

  • Data is inextricably linked to the methods that process it.

  • Memories have a language of their own, do humans share the language of memories? Could a goal of humanity be the effective translation of our language of memories? Could imperfection in translation be a huge source of conflict too?

  • Information is symbolic, without a means of interpreting the symbols the information conveys nothing.

  • Once the symbols can be distinguished, patterns within the symbols can be accessed. However, the original meaning (intent) may be lost and will generally be distorted.

  • Given a set of symbols and an interpretative mechanism for those symbols, to what extent are the patterns spotted a result of information within the interpretative mechanism, and to what extent are the patterns spotted a function of the information within the symbols? I rather suspect that this question also misses a point: It falls foul of the fallacies of subject/object metaphysics. Within any interaction there is participation from both sides. For an interpretative mechanism to detect patterns within symbols there will have to be a contribution of information from both sides, at some level.

  • Interpretative mechanisms range in style: some seek to minimize their input of information while maximizing the effect of the external data, others use external data as a randomizing element or mixing agent for expression of their own internal data (maybe this is a good framework for interpreting the occult/astrology/science etc.?).

  • An interpretative mechanism which includes the assumption that it does't affect the data it processes is ultimately flawed. (See subject/object metaphysics). Science often falls foul of this.

  • Data with no obviously associated interpretative mechanism is worth less, all other things being equal, than data with an interpretative mechanism.

  • High quality data may be restricted by a low quality interpretative mechanism, and vice versa etc.

  • Data can be represented by a set of symbols. An interpretative mechanism can be represented by a processor for that set of symbols, which may generate another set of symbols in response, or may create some physical output or whatever.

  • Data never exists as a set of symbols (except in the universe of platonic forms...). Instances of data almost always have some elements of the interpretative mechanism held locally. That is to say, the interpretative mechanism has a large influence on the manifestation of the data. Usually data and an interpretative mechanism co-evolve together, and are intimately interconnected, even if only at substrate levels (eg, dependence on ASCII, or spoken language or whatever).

  • An interesting model of reality is merely a seething array of interdependent information. The extent of this information is phenomenal, and the levels of structure range across the orders of magnitude widely.

Tuesday, August 23, 2005

a brief manifesto (for myself)

The computing substrate seems to be improving at an ever increasing rate. But there are some things that I think need improving, some on the level of user education, others technological. Here are a few that spring to mind:

  • Provision of parallel architectures that are actually properly designed
    We have amazing bleeding edge facilities nowadays. Progress is rampant, and in all directions, like bacterial colonies on agar jelly. We have various standards groups like the W3C trailing behind doing their best to mop up the spillage, but how good is it? The ideas mill is in overdrive, now all we need is groups to set these ideas in their proper places.
    I'm not advocating design fascism, but I am advocating the more widespread rolling out of bulletproof architectures that the military wouldn't be afraid to use.
    The secret to success is to realise that there are only a very few tricks in the book, but that those tricks are extremely powerful. The ideas mills churn out specific permuations. Someone needs to run along behind spotting the underlying patterns, and making simple but highly generalised tools that the rest of us can use without wading through piles of barely distinguishable competing 'standards' and hacks.
    Any contenders that I'm not aware of yet?
  • Data awareness
    Our data shadows are burgeoning. Technologies need to be developed that protect our data, and maximise its effectiveness. I have separate user profiles on multiple social websites. When will I be able to store my profile locally, and allow websites to access it when I wish, and allow them to store their own copies only if I allow them to? Decent data management would make the computing substrate so much more useful. I wouldn't have old data all over the internet crying out to be maintained. I could participate in a far wider variety of stuff. Service providers could concentrate on what they were good at, not on recollecting all the various data that everyone is tired of giving for the umpteenth time anyway.
  • Better documentation
    All due respect to the wikis of this world. But the linking systems are beginning to show their age. We need real databases with real tagging systems, and proper diagram support. Why are we still relying on bitmaps on the web? Macromedia's solution is a poor stopgap. What happened to SVG? A few more diagrams would help reduce the mess the ideas machine spews out, and improve quality for us all.
In a word, integration. We need to allow everything to be a whole lot more cohesive and consistent, while maintaining the exciting churning hotchpotch of creativity that provides us with the ideas that need organising in the first place.