Saturday, 16 November 2019

Maximum Entropy

On discussing Rossini's music with Loet Leydesdorff a couple of weeks ago (after we had been to a great performance of the Barber of Seville), I mentioned the amount of redundancy in the music - the amount of repetition. "That increases the maximum entropy," he said. This has set me thinking, because there is a lot of confusion about entropy, variety, uncertainty and maximum entropy.

First of all, the relationship between redundancy and entropy is one of figure and ground. Entropy, in Shannon's sense, is a measure of the average surprisingness in a message. That surprisingness is partly produced because all messages are created within constraints - whether it is the constraints of grammar on words in a sentence, or the constraints of syntax and spelling in the words themselves. And there are multiple constraints - letters, words, grammar, structure, meaning, etc.

Entropy is easy to calculate. There is a famous formula without which much on the internet wouldn't work.



Of course, there are lots of questions to ask about this formula. Why is the log there, for example? Just to make the numbers smaller? Or to give weight to something (Robert Ulanowicz takes this route when arguing that the log was there in Boltzmann in order to weight the stuff that wasn't there)

Redundancy can be calculated from entropy.. at least theoretically.

Shannon's formula suggests that for any "alphabet", there is a maximum value of entropy. It is called Maximum entropy. If the measured entropy is seen as a number between 0 and the maximum amount of entropy possible, then to calculate the "ground", or the redundancy, we simply calculate the proportion of the measured entropy to the maximum entropy and subtract it from 1.

Now mathematically, if the redundancy increases, then either the amount of information decreases (H) or the maximum entropy (Hmax) increases. If we simply repeat things, then you could argue that the entropy (H) goes down because it becomes less surprising, and therefore R goes up. If by repeating things we generate new possibilities (which is also true in music), then we could say that Hmax goes up.

No composer, and no artist, ever literally repeats something. Everything is varied (the variation form in music being the classic example). Each new variation is an alternative description. Each new variation introduces a new possibilities. So I think it is legitimate to say the maximum entropy increases. This is particularly true of "variation form" in music.

Now, away from music, what do new technologies do? Each of them introduces a new way of doing something. That too must be an increase in the maximum entropy. It's not an increase in entropy itself. So new technologies introduce redundant options which increase maximum entropy.

If maximum entropy is increased, then the complexity of messages also increases - or rather the potential for disorder and surprise. The important point is that in communicating and organising, one has to make a selection. Selection, in this sense, means to reduce the amount of entropy so that against however many options we have, we insist on saying "it's option x". Against the background of increasing maximum entropy, this selection gets harder. This is where "uncertainty" lies: it is the index of the selection problem within an environment of increasing maximum entropy.

However, there is another problem which is more difficult. Shannon's formula for entropy counts an "alphabet" of signals or events like a, b, c, etc. Each has a probability and each is added to the eventual number. Is an increase in the maximum entropy an increase in the alphabet of countable events? Intuitively it feels like it must be. But at what point can a calculation be made when at any point the full alphabet is incomplete?

This is the problem of the non-ergodic nature of life processes. I've attempted a solution to this which examines the relative entropies over time, considering new events as unfolding patterns in these relations. It's a bit simplisitic, but it's a start. The mechanism that seems to drive coherence is able, through the production of redundancies which increase maximum entropy, to construct over time a pattern which serves to make the selection and reduce the entropy to zero. This is wave-like in nature. So the process of increasing maximum entropy which leads to the selection of entropy to zero is followed by another wave, building on the first, but basically doing the same thing.

In the end, everything is zero.

Sunday, 10 November 2019

Design for an Institution: the role of Machine Learning #TheoryEDTechChat

There's an interesting reading group in Cambridge on the theory of educational technology at the moment. Naturally enough, the discussion focuses on the technology, and then it focuses on the agency of those operating the technology. Since the ontology of technology and the ontology of agency are mired in metaphysics, I'm not confident that the effort is going to go anywhere practical - although it is good to see focus on Simondon, and the particularly brilliant Yuk Hui.

But that raises the question: What is the thing to focus on if we want to get practical (i.e. make education better!)? I don't think it's technology or agency. I think it's institutions - we never really talk about institutions! And yet all our talk is framed by institutions, institutions pay us (most of us), and institutions determine that it is (notionally) part of our job to think about a theory of educational technology. But what's an institution? And what has technology done to them?

It is at this point that my theoretical focus shifts from the likes of Simondon, Heidegger, and co (great though I think this work is), to Luhmann, Stafford Beer, Leydesdorff, von Foerster, Ashby and Pask.

Luhmann is a good place to start. What's an institution? It is a autopoietic system which maintains codes of communication. "Autopoietic" in this sense means that codes of communication are reproduced by people ("psychic systems"), but that the "agency" of people in communicating is driven by the autopoietic mechanism (in Luhmann's jargon, it is "structurally coupled"). "Agency" is the story we tell ourselves about this, but it is really an illusion (as Erich Hörl has powerfully discussed in his recent "The archaic illusion of communication")

By this mechanism, institutions conserve meaning. I wonder if they also conserve information, and Leydesdorff has done some very important work in applying Shannon's information theory to the academic discourse.

Ashby's insight into information systems becomes important: "Any system that categorises effectively throws-away information" he wrote in his diary. That seems perverse, because it means that our so-called information systems actually discard information! But they do.

For Luhmann, discarding information means that the probability that communications will be successful (i.e. serve the mechanism of autopoiesis in the institution) will be reduced. As he pithily put it in his (best) book "Love as Passion": "All marriages are made in heaven, and fall apart in the motorcar". What he means is that when one person in a couple is driving, their lifeworld is completely different to their partner's. The context for meaningful communication is impaired by the mismatch in communicative activity which each is engaged in.

In our social media dominated world, where alternative lifeworlds metastasise at an alarming rate, the effect of technology in damaging the context for the effective conservation of meaning is quite obvious.

In the technocractic world of the modern university, where computer systems categorise students with so-called learning analytics, it is important to remember Ashby: with each categorisation, information is thrown away. With each categorisation, the probability that communications will be successful is diminished as the sphere of permissible speech acts becomes narrower. Instead of talking about the important things that matter most deeply, conversations become strategic, seeking to push the right buttons which are reinforced by the institutional systems: not only the bureaucratic systems of the university, but the discourse system of the publishers, and the self-promotion system of social media. This is the real problem with data.

The problem seems quite clear: Our institutions are haemorrhaging information. It is as if the introduction of information systems was like putting a hole in the hull of the institutional "ship".

Stafford Beer knew this problem. It is basically what happens when the coordination and control part of his "viable system model" (what he called "System 3") takes over, at the expense of the more reflective and exploratory curious function that probes the environment and examines potential threats and opportunities (what he called "System 4"). In companies, this is the R&D department. It is notable that universities don't have R&D departments! Increasingly, R&D is replaced by "analytics" - the system 4 function is absorbed into system 3 - where it doesn't belong.

But let's think more about the technology. System 3 tools categorise stuff - they have to - it's part of what system 3 has to do. This involves selecting the "right" information and discarding the rest. It is an information-oriented activity. However, the opposite of information is "redundancy" - pattern, repetition, saying the same thing in many ways... in education, this is teaching!

Machine learning is also predominantly a redundancy-based operation. Machine learning depends on multiple descriptions of the same thing from which it learns to predict data that it hasn't seen before. I'm asking myself whether this redundancy-oriented operation is actually a technological corrective. After all, one of the things that the curious and exploratory function of system 4 has to do is to explore patterns in the environment, and invent new interventions based on what it "knows". Machine learning can help with this, I think.

But only "help". Higher level coordination functions such as system 4 require human intelligence. But human intelligence needs support in being stimulated to have new kinds of conversations within increasingly complex environments. Machine learning can be incredibly and surprisingly creative and stimulating. It can create new contexts for conversations between human beings, and find new ways of coordinating activities which our bureaucratic systems cannot.

My hunch is that the artists need to get on to this. The new institutional system 4, enhanced by machine learning, is the artist's workshop, engaging managers and workers of an organisation into ongoing creative conversation about what matters. When I think about this more deeply, I find that the future is not at all as bleak as some make out.

Tuesday, 5 November 2019

Non-Linear Dynamics, Machine Learning and Physics meets education

In my recent talk about machine learning (in which I've been particularly focussing on convolutional neural networks because they present such a compelling case for how the technology has improved), I explored the recursive functions which can be used to classify data such as k-means. The similarity between non-linear dynamics of agent-based modelling and the recursive loss functions of convolutional neural network training are striking. It is hard for people new to machine learning to understand that we know very little of what is going on inside. The best demonstration of why we know so little comes from demonstrating the non-linear dynamic emergent behaviour in an agent-based model. Are they actually the same thing in different guises? If so, then we have a way of thinking about their differences.

The obvious difference is time. A non-linear agent-based model's behaviour emerges over time. Some algorithms will settle on fixed points (if k-means didn't do this it would be useless), while other models will continue to feed their outputs into their inputs endlessly producing streams of emergent behaviour. The convolutional process appears to settle on fixed points, but in fact it rarely fully "settles" - one can run the python "model.fit()" function for ever, and no completely stable version emerges, although stability is established within a small fluctuating range.

I discussed this fluctuation with Belgian mathematician Daniel Dubois yesterday. Daniel's work is on anticipatory systems, and he built a mathematical representation of the dynamics that were originally introduced by biologist Robert Rosen. Anticipation, in the work of Dubois, results from fractal structures. In a sense, this is obvious: to see the future, the world needs to be structured in a way in which patterns established in the past can be seen to relate to the future. If machine learning systems are anticipatory (and they appear to be able to predict categories of data they haven't seen before), then they too will contain a fractal structure.

Now a fractal is produced through a recursive non-linear process which results in fixed points. This all seems to be about the same thing. So the next question (one which I was asking both Daniel Dubois, and Loet Leydesdorff who I saw at the weekend) is how deep does this go? For Loet, the fractal structures are in communication systems (Luhmann's social systems), and (importantly) they can be analysed using Shannon's information theory. Daniel (on whose work Loet has constructed his system), agrees. But when we met, he was more interested to talk about his work in physics on the Dirac equation and what he believes to be a deeper significance of Shannon. I don't fully understand this yet, but we both agreed that if there is a deeper significance to Shannon, then it was a complete accident because Shannon only half-understood what he was doing... Half-understanding things can be way forwards!

Daniel's work on Dirac mirrors that of both Peter Rowlands in Liverpool and Lou Kauffman in Chicago (and now Novosibirsk). They all know each other very well. They all think that the physical world is basically "nothing". They all agree on the language of "nilpotents" (things multiplying to zero) and quaternions (complex numbers which produce a rotational geometry) as the fundamental building blocks of nature. There is an extraordinary intellectual confluence emerging here which unites fundamental physics with technology and consciousness. Who could not find that exciting?? It must have significance for education!

What's it all about? The clue is probably in Shannon: information. And I think it is not so much the information that is involved in learning processes (which has always been the focus of cognitivism). It is the way information is preserved in institutions - from the very small institutions of friendship and family, to larger ones like universities and countries.

Our technologies are technologies of categorisation and they throw away information. Since the computer revolution, holes have appeared in our social institutions which have destabilised them. The anticipatory function, which is essential to all living things, was replaced with a categorising function. The way we use machine learning also tends to categorise: this would make things worse.  But if it is an anticipatory system, it can do other things - it can provide a stimulus for thought and conversation, and in the process put information back into the system.

That is the hope. That is why we need to understand what this stuff does. And that is why, through understanding what our technology does, we might understand not only what we do, but what our institutions need to do to maintain their viability.

Education is not really about schools and universities. Those are examples of institutions which are now becoming unviable. Neither, I think, is it really about "learning" as such (as a psychological process - which ultimately is uninspectable). Education is about "institutions" in the broadest sense: families, friendships, coffee bars, businesses, hospitals... in fact anywhere which maintains information. To understand education is to understand how the processes which maintain information really work, how they can be broken with technologies, and how they can be improved with a different approach to technology.