Monday, 19 January 2015

Is meta-theory a theory of education? Some thoughts on a journey from 'information' to the ontological theatre of the classroom

Today scientists tend to operate within what we call 'discourses'. In real-terms, that means journals, conferences, email lists, departments, and so forth. Discourses are peculiar things: they become little micro-worlds like different planets, whilst their presence becomes closely tied to the institutional fabric which sustains them. Participating in discourse is to validate one's position in the university - and that's as much about paying the mortgage as it is about contributing to knowledge. What if we were to try to understand the whole show - the knowledge, the mortgage, the egos, etc? That's the show where people think of new theories, talk about them and teach them to students whilst they live their lives in the world, buy cars, eat pizza and ferry the kids around. How could different theories be situated against a meta-theory - a theory of theories? Can I answer my question without recourse to a discourse? Probably not. But if there is a discourse which might help, it is that which concerns the stuff which is between all of us - which we conventionally call 'information'. So I want to start by talking about information theory - particularly about the relationship between redundancy and information - and work towards an argument for taking education seriously because it is, it seems to me, the place where all aspects of the world come together.

Redundancy is a confusing word. In information theory, it's associated with repeating things. There is a relationship between repeating things, and the expectations of future events. The question is, If things are repeated, do we become more or less certain about what is going to happen next? We might say 'more' because we might expect the next thing to happen to be a repetition of what's just happened. We might say 'less' because by virtue of something being repeated, we might expect something different to happen. Then there is the relationship between redundancy, information and entropy. Entropy, first defined in Boltzmann's physics, is basically (as Paul Davies elegantly argues) a measure of 'ignorance': "if we know all the molecules in a box are in one corner, then it has low entropy"; we are relatively certain about the location of the molecules. Conversely, if the molecules are evenly dispersed in the box, then we really haven't a clue (we are ignorant of) where any of them are exactly - that's high entropy. Information is the opposite of entropy just as it is the opposite of ignorance.

Ulanowicz has pointed out recently that when Shannon borrowed Boltzmann's equations, he took things the other way round (see Ulanowicz blames Von Neumann for Shannon's mistake - he was known to joke around and unfortunately Shannon took his joke about entropy (there can't be many of those!) seriously. Thus entropy became a measure of the average "surprise" value in a communication. Considering each part of the message in the communication, if the probability of message part was close to 1, the calculated entropy (log (p)) was very small (so no surprise!); If the probability of the message part was very small, the calculated entropy was big (so big surprise!). Reflecting back on education for a minute, my experience is that we tend to remember 'big surprises'; small surprises cause us to have to 'work hard' usually repeating things over and over again (redundancy!). Interestingly though, we seem to take the small surprises more seriously...

Boltzmann's entropy measure is of ignorance, or absence; Shannon appears to regard ignorance as a prelude to possible surprise. He might say his measure, H, is the "average presence of surprises". The problem is that there's so much tied up in the idea of 'surprise': it entails much more than a measure of certainty about molecules in space. Emphasising this, Ulanowicz cites Ernst von Weizs├Ącker in remarking that the confirmation of a message is entirely different from it being a surprise, concluding with regard to Shannon's H measure that “meaningful information … does not lend itself as being quantified by one single mathematical expression” says Weizs├Ącker (quoted by Ulanowicz). Ulanowicz then suggests that Shannon's measure of average "surprise" might be broken down and studied for the component which addresses 'surprise' itself, and the component which addresses 'confirmation'. This latter component, Ulanowicz gives the name 'flexibility'. This "flexibility" appears as a kind of reborn idea of redundancy.

Shannon's mistake is not just due to Von Neumann. It is probably also due to a kind of anthropomorphising of 'information'. One of the great difficulties in thinking about information is that theories of all kinds struggle to avoid becoming theories of agency: we try to work out the stuff between us and the world and end up inadvertantly describing a model of ourselves. I don't know of a theory of information which doesn't do this: from Maturana to Floridi, through Deacon and Maynard-Smith, the ghost of Kantian idealism is impossible to shake off. Most theories of learning also suffer the same fate (indeed, they come from the same cybernetic/Kantian stable) Does Ulanowicz get away with it?

The issue at stake is not that we have no coherent theory of information (although, as Deacon says, we don't), we do not need new theories of information. I doubt any of them can escape the idealist trap. What we need is a new philosophy of science which situates the synthetic/empirical (this is Shannon's domain) with the analyical/logical. This means a theory about theory: we need better meta-theory.

I find this realisation helpful because at a purely practical level, for all its shortcomings, Shannon's theory appears to be practically useful. From encryption to compression, the statistical measurement of "average surprise" has been powerful. Advanced techniques in analysing text like Topic Modelling produce spooky "deus ex-machina" results that suggest that our 'meaning' might be objectively encoded in our words. Google translate continues to amaze in its ability to pattern match linguistic expressions across languages (only this week, they demonstrated this: Then there's the remarkable ability to 'fingerprint' complex data streams - whether they are genomes, videos, music, or cortical activity. Pattern-based video description is practically with us; the next 10 years will probably see devices that can make a stab at reading your mind. Spooky. But there's a serious scientific problem here - problems which could not have been apparent to Hume when he thought about what it was scientists were doing, and in doing so kick-started Kant.

When the data analysts play around with Shannon's equations, buoyed on by their successes, they believe they are doing science. However, a quick return to Hume would lead us to ask the question "Where are the regularities?" Here we return to the fundamental problem in Shannon's work: surprise and confirmation are not the same. And just as the algorithms for Topic Modelling (or whatever it might be) appear to identify the topics of a document corpus, we are ourselves caught in Shannon-like situation of having the topics 'confirm' what we think the documents are about (if it did not confirm it, we would probably discount the results!), whilst saving our "surprisal" in wandering at the magic of the algorithm in being able to confirm something we already knew! Where are the regularities? They are in the coincidence of surprise and confirmation. They are in the heads of the observers as they gaze into a world which they themselves have created. It is worth remarking that Voodoo exhibits this kind of regularity in a similar way. It is not difficult to see where the problems are. The emergent social impacts of automatic topic identification would result in new orderings of society, as those with the capacity to execute the algorithms on a large scale (Google, Apple, etc) would jockey for increased social power, usurping democratic structures. People would find ways around the algorithms. Everyday language might evolve: maybe we end up speaking a kind of Cockney Rhyming-slang to put the machines off the scent.

Hume's theory is a theory of expectation. Events create expectations because they lead us to rationalise about the way things work and what they are likely to do next. Hume believed that the nature of the world itself - whether constant conjunctions of events were natural or not - was undecidable: it might all be chaos upon which order was created by human thought. In other words, it was not God who imposed order on the void, but man. Reproducible experiment was the activity whereby the order-creating intellect could be brought to bear upon the world. But when we say 'world' we must be careful what we mean. The world of Shannon and the miraculous machines that owe him so much is a small one - a subset: one where humans look for regularities in their own creation. That it bears some kind of correspondence to the rest of the world is no more surprising than the needle in the voodoo doll coinciding with a pain in the stomach.

What doesn't happen with the voodoo doll, and what isn't happening with our current 'information' obsession is the kind of knocking-up against realities which would cause us to inspect our reasoning. It has become too easy for us to create universes where what we believe can be shown to hold, and for evolutionary arguments to be used to defend its explanatory deficiencies everywhere else with scientists saying "it's early days - we will get there!" Paul Davies suggests what might be necessary to counter this: "experimental ontology". I think I agree - it's very similar to Andrew Pickering's idea of 'ontological theatre', which I quite like.

So why does understanding education really matter? Because it is the best ontological theatre in town. It is the place where ideas, objects, matter and mattering collide. Most importantly, it's where ideas are formed.

The meta-theory is an educational theory.

No comments: