Wednesday 31 August 2022

Tacit Revolution

As technology advances in society, there is no escaping the increasing specialisation of knowledge. This may seem like a challenge to those like me who believe that the way forwards is greater interdisciplinarity, or a metacurriculum (as I wrote here: https://link.springer.com/article/10.1007/s42438-022-00324-1). While greater specialisation indicates increasing fracturing of the curriculum and the growth and support of niche areas, more fundamentally it represents the organisational challenge to find the best way to support specialised skills from the generalised organisational frameworks of curriculum, assessment, certification, etc. At the root of this challenge is the fact that generalised organisational frameworks - from KPIs to curriculum - all depend on the codification of knowledge. Meanwhile, hyperspecialisation is largely dependent on tacit knowledge which is shared among small groups of professionals in innovative niche industries, startups, and departments of corporations. 

Since Michael Polanyi's seminal work on tacit knowledge in the 1950s in Manchester University, the educational challenge of transmitting the uncodifiable have been grappled with in industry. The Nonaka-Takeuchi model of knowledge dynamics within organisations has been highly influential in understanding the ways that professional knowledge develops in workplace settings (see https://en.wikipedia.org/wiki/SECI_model_of_knowledge_dimensions). It is in these kinds of dynamics that we are seeing technology drive (and perhaps even accelerate) processes of tacit knowing and externalisation. It's now not only highly technical people whose tacit knowledge must be communicated to senior management, but the technics of machine learning (for example) which are so different from the technics of database design or full-stack implementation, or component manufacture, where each of these technical aspects are interdependent. The communication between technical groups is critical - industries survive or fail on the quality of their internal communications. 

What is critical to making the communications work is dialogue. This may be partly why so many successful technical industries adopt flat management structures and dynamic and adaptive ways of configuring their organisation. That is driven by the dynamics of dialogue itself - the need for technical conversations to flow and develop. Compare this to the rigid hierarchical structures of every educational institution, burdened with its bureaucratic codified curricula. It is completely different. One wonders how it will survive the changes in the world. 

Over the last year and a half, I have been involved in a research project in the University Copenhagen on the digitalisation of education. This project was set up because Universities at least recognise the problem - they are getting left behind in a world which is changing too fast. But like all institutions, Copenhagen believed that the way to address this was to tweak its structures and what it "delivered" - basically to change the curriculum. This was not going to work, and our project has shown it. But this is not because of a failure to get the "right" tweaks to the curriculum. It is because technical knowledge is largely tacit and uncodifiable, and the organisational structures of education cannot deal with tacit knowledge. Indeed, with the bureaucratisation of education, it is even less able to deal with tacit knowledge than it was in Polanyi's time. 

I am now working for the Occupational Health department in the University of Manchester. This is a domain for professionals in health and industry to identify and analyse the relationship between environmental circumstances (work places, etc) and personal and public health. There is so much knowledge in professionals working in this area which is the product of years of experience in the field. Much of it is uncodifiable. 

Uncodifiable does not mean untransmissible. Uncodifiable knowledge can be taught dialogically, and more importantly, technology can greatly assist in producing new kinds of dynamic dialogical situations where this transmission can take place. I am currently looking at ways of doing this in occupational health, but as I do so, I am thinking about what it means for technical education, hyperspecialisation and tacit learning. 

This is almost certainly going to be the trajectory of educational technology. It doesn't look like it at the moment because "edtech" sees itself (tacitly!) as "educational management tech" - we don't really have "technology for learning" as such. It's managers who write the cheques for edtech. But that will change. We're going to need "technology for learning".

There's then another challenge for institutions. Because when we do have "technology for learning", the dialogical situations of tacit learning will not need to be bound by classroom, curriculum, assessment, etc. They can be situated in the world alongside the real activities that people engage in. My own experience of co-establishing a medical startup around an AI solution to diabetic retinopathy diagnosis is indicating this, and there are many other similar startups. Mine started with an educational desire to teach people how to diagnose. It ended with a new product that embraced the educational aspect but did something powerful in the actual domain of work too. 

This is where things are going. I'm not sure this "tacit revolution" is going to be quiet though...

Monday 29 August 2022

Anticipation and Learning as Information

This is a follow-on blog to yesterday's on "Visualising Learning Statistically". The most powerful thing in any scientific inquiry is to have two ways of saying similar things. Yesterday, I suggested a way of thinking about learning as emerging distinction-making in terms of relations between normal distributions (in the manner that psychophysicists like Thurstone thought). Among the powerful features of seeing things this way is the fact that the statistics strongly suggests (indeed, insists) that there must be a common origin to the psychological phenomena which produce normal distributions. 

Another way of thinking about this is to consider what happens when any phenomenon is presented to consciousness and a distinction is made. A distinction might be called a "category" - something like "chair", "table", "book", "dog". In psychological experiments, what is measured is not the perception of difference per se, but rather the articulation of difference. In psychophysics for example, this is the expression of judgements of degrees of similarity to normality. That entails not only the perception of something in the environment, but selecting a word for it. To utter a word in response to a stimulus is a communicative act. 

No word can be uttered without having some idea of the effect of that utterance. We do not make words up for things we don't know. Rather like contemporary machine learning, we fit the word which we know as the most likely utterance which we believe will be understood by others. Unlike machine learning, we might not be sure, so utter the word as a question to see the response, but fundamental we are making a prediction. Paraphrasing George Kelly, who, alongside sociologists like Parsons and later, Luhmann, to make a distinction is to anticipate something in the communication system in which we operate. So we should ask, how might this anticipation work?

To anticipate anything is to recognise a pattern which relates some expected experience to a previous experience. As a pattern, there must be something about a phenomenon which is more general than the specifics of any particular instance. The fact that an anticipation is about something present in relation to something past means that there must be a dimension of time. The time dimension works both in the ongoing unfolding of a present experience, and "backwards" in the sense of reflexivity which relates what is present to what is past. This process of identifying the commonality between what is past and what is present is a selection mechanism for the utterance of whatever one thinks is the category that relates to what is currently seen. To create a selection mechanism for an utterance must entail 1. the selection of an appropriate model of past experience which relates to present experience from a set of possible models; 2. the management of a set of possible models; 3. the ongoing generation of models from present experience. 

Yesterday I said that the psychodynamics of distinction-making mean that the ability to refine distinctions is related to the ability to relax distinctions in a different domain - so Freudian "oceanic" experiences are important as an anchor for new distinction-making. That's the kind of statement which might irritate some, but I don't see it as saying anything more than the need for sleep and dreams in order to do work. It is the push and pull of the imagination - much like music, as I wrote here: https://onlinelibrary.wiley.com/doi/full/10.1002/sres.2738

Because making a distinction relies on a selection mechanism which in turn relies on a pattern, we can see a further argument for why the selection mechanism is dynamic between ongoing refinement and "oceanic nothingness". Patterns are segmented typically through repetition. Repetition itself, from an information theoretical perspective, is "redundancy" - it has an entropy of zero. Thus we can say that the segmentation of pattern is achieved through passages of high entropy followed by low or zero entropy. This helps to explain why repetition (as redundancy) is so important for memory - the essential feature of an effective selection mechanism for identifying a category is the ability to segment patterns of experience from the past to relate it to the future. 

This also reinforces the point that there must be a common biological origin which is responsible for steering this process. Patterns established in communication rely on cellular communication throughout the brain and other organs in the body. Within cells there are also patterns which reference evolutionary processes which themselves are demarcated by nothing. Statistically this can be observed as a normal distribution, but it can be also modelled as a process of evolutionary construction of patterns which act as selection mechanisms for communication. At a cellular level, these points of "nothingness" are homeostatic points of equilibrium between a cell and its environment.

The role of the environment in learning and evolutionary development is critical. The construction of anticipatory systems is a kind of evolutionary dance of endogenising the environment, where specific stages of development are segmented in ways where one stage can be related to other stages. It is this evolutionary dance which is the reason why there is always a distribution of traits and abilities which then give rise to measurable statistical phenomena. 


Sunday 28 August 2022

Visualising Learning Statistically

To talk of learning as a process which we can observe is very difficult. When we teach teachers, we teach "theories" of learning which are just-so stories with little hard evidence to back them up barring a few (now famous) psychological experiments. The resort to teaching theory is partly because this is so hard that we would struggle to decide what we should talk about if we didn't just talk about theory. The irony is that talking about theory can be very boring, encouraging professors who didn't think of any theory themselves to talk endlessly about what's written in textbooks - not exactly an example of good teaching! Ultimately we end up with what is easiest to deliver, rather than what needs to be talked about. 

I think the birth of cybernetics in the 1940s was the best chance we had of remedying this situation, but for various reasons, a lot of this transdisciplinary insight was lost in the 1950s and 60s, as other disciplines (notably psychology) appropriated bits of it but lost sight of its key insights. Now, the growth of machine learning is providing a new impetus to revisit cybernetic thinking, with people like James Bridle leading the way in a revised presentation of these ideas (see his "Ways of Being"). One of the most impressive things about Bridle's book is the fact that he reconnects cybernetics to biology and consciousness. That connection was at the heart of the original thinking in the discipline. The biology/consciousness thing is really important - but isn't it just another just-so story? If we don't have any way of measuring anything, then I'm afraid it is. 

Here perhaps we need to look a bit deeper at the whole issue of "measurement" as it is practiced in the social sciences. Another historical development from the 1950s was the increasing dominance of statistical techniques in disciplines like economics. Tony Lawson argues that this was directly connected to the McCarthy period, where anything statistical was "trusted" as scientific and anything "critical" was communist! - as Lawson points out in his "Economics and Reality", the greatest economists of the 20th century (including Hayek and Keynes) were highly skeptical of the use of mathematics in economics. 

Statistical techniques are regularly used in academic papers in education to defend some independent variable's impact on learning. These are usually the result of academic training in statistics for researchers - not the result of a critical and scientific inquiry into the the applicability of techniques of probability to education. But there are fundamental questions to ask about statistical procedures. These include:

  • Why do natural phenomena reveal normal (Gaussian) distributions in the first place? 
  • What is an independent variable, and why should an independent variable (if such a thing exists) produce a new normal distribution?
  • All statistics is about counting - but what is counted in something like learning, and how are the distinctions made between different elements that are counted? 
  • What happens to the uncertainty about distinction-making in what is counted (Keynes made this point in his "Treatise on Probability" with regard to his discussion about Hume's distinguishing between eggs)
  • Where is the observer in the counting process? Are they an independent variable?
  • It is well-recognised that "exogenous variables" are highly significant causal factors - particularly in economics (which is often why economic predictions are wrong). Yet normal distributions arise even when exogenous variables are bracketed-out. Why?
  • While one big problem with statistical techniques is the fact that averages are not specifics, averages nevertheless can sometimes prove useful in making effective interventions. Why? 
  • Why does statistical regression (sometimes) work? (particularly as we see in machine learning)
  • Is a confidence interval uncertainty?
These are the kind of "stupid questions" which never get asked in education research, or anywhere else outside philosophy for that matter. I want here to think about the first one because I think it underpins all the others. 

Normal distributions (calculated using mathematical equations developed by de Moivre, Euler and Gauss in the 18/19th centuries) require a statistical mean and standard deviation to produce a model of likelihood of a set of results.  Behind the reliability of these assumptions is the fact that there is - among the phenomena which are measured - some common point of origin from which the variety of possible results can be obtained. Thus the top of a bell curve indicates the result which is maximally probable having passed through all the possible variations that stem from a common point of origin. 

Mathematically, we can produce a normal distribution from techniques arising from Central Limit Theorem (CLT), where a normal distribution will arise from the sums of normalised random data (see https://en.wikipedia.org/wiki/Central_limit_theorem) . According to Ramsey (https://en.wikipedia.org/wiki/Ramsey%27s_theorem) and others, true randomness is impossible. So the normal distribution is really a reflection of deeper order arising from a single point of origin. What is this point of origin? What does a normal distribution in educational research really point to?

It must lie in biology, and (importantly) the fact that biology itself must have a common point of origin. Because we tend to think of education as a cultural phenomenon, not a natural one, this point is missed. But we are all made of the same physiological stuff. And the components of our physiology have a shared evolutionary history, and it is highly likely that this shared evolutionary history has a point source. So looking at your educational bell curve is really looking at the "red-shift" of biological origins. This is an important reason why it "works".

However, this doesn't explain learning itself - it just helps to explain the diversity of features (behaviour) in a population which can be observed statistically. Much more interesting, however, is to look at how the process of making distinctions arises given that normal distributions are everywhere.

This is why psychophysics is so interesting. The psychophysicists were interested in the distributed differences that different stimuli make on a population. Some differences make big differences in perception: for example, hot and cold. Other differences are harder to distinguish - for example, the difference between Titian and Tintoretto. These differences can also be represented statistically. For example, the orange curve below might be "hot", and the blue curve might be "cold". There is little uncertainty between these distinctions, and within any population, there is no question that what is hot is identified as hot (with a little variation of degree).



But here (below), there is much more uncertainty in distinction making. 

It is this kind of uncertainty in making distinctions between things which characterises learning processes at their outset. Whether it is being able to distinguish the pronunciation of words in a foreign language, or being able to manipulate a new piece of software, among the various categories of distinctions to be made, there is a huge overlap which leaves learners initially confused. 

As the learning process continues, this distinction-making becomes more defined:
So given phenomenon x, the likelihood of correct categorisation of that phenomena is improved. 

But it is important to remember what these graphs are really telling us - that the Gaussian distribution implies a common point of origin. The second graph is the result of a conditioning process upon natural origins - rather like a cultivated garden. But perhaps more importantly, this is dynamic, where the point of origin is ever-present, and exerts an influence on distinction-making. This may be why, despite increases in the ability to make distinctions in one domain, there is a biological requirement to relax distinction making in other domains, and these domains may be related.

"Oceanic" experiences - those that Freud associated with the "primary process" of the subconscious remain an important part of the overall dynamic of distinction-making. This looks something like this:

We make the mistake of seeing learning in terms of moving towards graphs 1 and 3, without seeing the dynamic pulse which relates graphs 1 and 3 to graphs 2 and 4. But this process is critical - without the oceanic connection to distinctionlessness, the coordination mechanism (i.e. reference to origins) which facilitates higher-order distinctions (graph 3) cannot coordinate itself and is more likely to collapse in a kind of schizophrenia (this is what Freud talked about in terms of the superego taking over and the psychodynamics breaking down). 

Looking at learning like this does two things. It invites us to think about our methods of scientific measurement differently - particularly statistics - as a means of looking at life processes as processes which refer to a common origin. Secondly, it gives us a compass for assessing the interventions we make. Our current lack of a compass in education and society is quite obvious.