Thursday 6 February 2020

Why the current phase of Machine Learning will fail

Over the last two years I've been involved in a very interesting project combining educational technology assessment techniques with machine learning for medical diagnostics. At the centre of the project was the idea that human diagnostic expertise tends to be ordinal: experts make judgements about a particular case based on comparisons with judgements about other cases. If judgement is an ordinal process, then the deeper questions concern the communication infrastructure which supports these comparisons, and the ways in which the rich information of comparison is maintained within institutions such as diagnostic centres.

Then there was a technical question: can (and does) machine learning operate in an ordinal way? And more importantly, if machine learning does operate in an ordinal way, can it be used as a means of maintaining the information produced by the ordinal judgements of a group of experts such that the combined intelligence of human + machine can exceed that of both human-only and machine-only solutions.

The project isn't over yet, but it would not be surprising if the answer to this question is equivocal: yes and no. "Yes" because this approach to machine learning is the only approach which does not throw away information. The basic problem with all current approaches to machine learning is that ML models are developed with training sets and classifiers as a means of mapping those classifiers automatically to new data. So the complexity of any new data is reduced by the ML algorithm into a classification. That is basically a process of throwing away information - and it is a bad idea, which amplifies the general tendency of IT systems in organisations which have been doing this for years, and our institutions have suffered as a result.

However, not discarding information means that the amount of information to be processed increases exponentially (in fact it increases according to the number of combinations). It doesn't matter how powerful one's computers are, an algorithm with a computability growth rate like this is bad news. Give it 100 images, and it might take a day to train the machine learning for 4950 combinations. 200 gives 19500, 300 gives 44850, 400 gives 79800, 500 gives 124750. So if 4950 takes 24 hours, 500 will take 600 hours = 25 days. It won't take long before we are measuring the training time in months or years.

So this isn't realistic. And yet it's the right approach if we don't want to discard information. We don't yet know enough, and no amount of hacking with Tensorflow is going to sort it out.

The basic problem lies in the difference between human cognition and what the neural networks can do. The reason why we want to retrain the ML algorithm is that we want to be able to update its ordinal rankings so that they reflect the refinements of human experts. This really can only be done by retraining the whole thing with the expanded training set. If we don't retrain the whole thing, then there is a risk that a small correction in one part of the ML algorithm has undesirable consequences elsewhere.

Now humans are not like this. We can update our ordinal rankings of things very easily, and we don't suddenly become "stupid" when we do. How do we do it? And if we can understand how we do it, can that help us understand how to get the machine to do it?

I think we may have a few clues as to how we do this, and yes I think at some point in the future it will be possible to get to the next stage of AI where the machine can be retrained like this. But we are a long way off.

The key lies in the ways that the ML structures its data through its recursive processes. Although we don't have direct knowledge of exactly how all the variables and classifiers are stored within the ML layers, we get a hint of it when the ML algorithm is "reversed" to produce images which align with the ML classifiers, such as we see with the Google Deep Dream images.

These are basically fractal images which reflect the way that the ML convolutional neural network algorithm operates. Looking at an image like this:

we can get some indication of the fractal nature of the structures within the machine learning itself.

I strongly suspect that not only our consciousness, but the universe itself, has a similar structure. I am not alone in this view: David Bohm, Karl Pribram and many others have held to a similar view. Within quantum mechanics today, the idea that the universe is some kind of "hologram" is quite common, and indeed, the hologram is basically another way of describing a fractal (indeed, we had holograms long before we could generate fractal images on the computer).

What's important about fractals is that they are anticipatory. This really lies at the heart of how ML works: it is able to anticipate the likely category of data it hasn't seen before (unlike a database, which can only reveal the categories of data it has been told about).

What makes fractals awkward - and why the current state of machine learning will fail - is that in order to change the understanding of the machine, the fractal has to be changed. But in order to change a fractal, you don't just have to change one value in one place; you have to change the entire pattern in a way in which it remains consistent but transformed in a way where the new knowledge is absorbed.

We know, ultimately, this is possible. Brains - of all kinds - do it. Indeed, all viable systems do it. 

No comments: