Problem Solving Examples (With some Machine Learning)

So, in a previous post, (http://nayrb.org/~blog/2015/12/25/automation-and-machine-learning/), we talked about some methods to help you decide whether you actually needed Machine Learning or not to solve your problem. This post talks about some various different problem solving approaches and which types of problems they can make tractable.

I started my career fascinated by protein folding and protein design. By the time I got there, they had narrowed the question down to one of search: ‘Given this physics-based scoring function, how do I find the optimal configuration of this molecule’? There were a number of different techniques they were using: gradient descent, monte carlo, simulated annealing, but they all boiled down to finding the optimal solution to an NP-Complete problem.

As we know that biological systems can perform protein folding quickly, there must be some algorithm which can do this (even if it means simulating each individual electron). This can then be restated as a simulation/decision question, from the perspective of a cell/physics. Many other search problems have similar human-like or physics-like easier solutions (ways of finding the NP-Complete verifier). For example, as a traveling salesperson, you would look at the map, and be able to narrow down the routes to some smaller number, or be able to quickly narrow down the options to a small number of sets of routes.

In many ways, this is the ‘holy grail’ of Machine Learning, the ability for a machine to step away from what we tell it, and to be able to solve the problem in a more direct way. Heuristics are an attempt to solve this problem, but they’re always somewhat rules-based.

Next is clustering, best used for differentiating between different groups of things so that you can make a decision. My favourite is ‘Flow Cytometry’ https://en.wikipedia.org/wiki/Flow_cytometry, where you’re trying to differentiate different groups of cells, basically through clustering on a 2-D graph of the brightness of various fluorescent cell markers.

Customer persona clustering is another example, such as you might do for segmentation, where standard groups like age or location would not be good enough.

Machine Learning problems such as the Netflix challenge http://www.netflixprize.com/, where you want a large degree of accuracy in your answer, require the use of a number of techniques. (The problem was to take a list of customer movie ratings and predict how those customers would rate other movies.)

First, you need to clean and normalize the data. The authors were also able to separate the general opinion of each movie from the specific opinion each person had about each movie. (Each of these was about as important to the overall result.) Each of these normalizations or bias removals would likely have been done with some form of machine learning, suggesting that any comprehensive usage would require multiple pipelines or channels, probably directed by some master channels* learning from which of them were the most effective.

I wonder how much of what we do as humans involves breaking down the problem, to divide and conquer. When we’re asked for a movie recommendation, do we think of good movies first, then what that person would think of? Personally, I feel I get my best results when I try to put myself in that person’s shoes, suggesting there may be a long way still to go.

Perhaps looking at groups of movies, or some sort of tagging, to get at whatever ‘genes’ may be underneath, as you may like certain things about movies which are only imperfectly captured by how people like them similarly. (Or perhaps, the data is big enough to capture all of this. It’s fun to speculate. 😀 )

*This suggests a hierarchy, which is only one way of seeing the structure. Other views are possible, but outside the scope.

Automation and Machine Learning

When we ask a computer for help with a task, what are we asking for?

1) Help with automating a repetitive task
2) Help with a decision

1) Help with automating a repetitive task
There are various ways you can automate a repetitive task. You can:
a) Ask your computer to do the same thing again and again, regardless of input (display the home page)
b) Give it some simple rules to follow (if they try to navigate to a non-existent page, show them a 404)
c) Give it some complex or not fully understood rules to follow (based on our tests, these are the solutions you should attempt, in this order)
d) Give it inputs, and have it adapt (‘Watch me perform this industrial assembly task, now you do it’)

2) Help with a decision
There are various different ways you can use a computer to help with a decision. You can:
a) Display data in various interesting ways (Data Visualization)
b) Give it the data and some rules to follow (Standard decision automation)
c) Give it the data and a desired output/scoring function (Supervised/Reinforcement Learning)
d) Give it the data and nothing else* (Unsupervised Learning)

This is somewhat of a false dichotomy, as adding new types of decisions allows more and more automation.

– Search (inputting words, pictures, video into a search engine and asking for a result) generally started with 2.a) (Data Display), and seems to be trying to move up the decision hierarchy, anticipating questions and the rules the user would want it to follow. This seems to be generally done with statistics, but I expect this would be switching over to pattern-finding neural nets
– Clustering (throwing a bunch of data into the hopper and getting groupings back) is also mostly in the Data Visualization bucket. It could also be an input into a machine learning algorithm, which would then be trained to make decisions based on these clusters
– Machine Learning (giving a bunch of data and getting a decision or pattern out) can be used for most or all of the options above, and similar to how computers have gotten ‘fast enough’, Machine Learning is becoming ‘good enough’ or ‘easy enough’ to replace many of the above.

So, as a human, when do you choose each of these? Assuming the options get more difficult going down the list, you would:

1) Start by googling various things (mostly to see what has been done before***).
2) You could then look at the data, clean it, and try clustering it into groups, to see if any of them made sense for the decision you wanted to make.
3) If neither of these worked, or if you wanted more, you could derive a scoring function for the output you wanted, then supply a Machine Learning algorithm with a substantial amount of data, and see how optimal it could make the decision.
4) If you don’t even know what decision you want, or are having difficulty making a scoring function, you could throw the data into an unsupervised learning hopper and see what comes out.

At each of the steps above, you can hive off parts and automate them, either using rules derived from the patterns you’ve found, or using flexible rules from the Machine Learning algorithm. You may find you can accomplish most of your task without having to resort to complex or incompletely understood algorithms.

More examples in subsequent posts. Stay tuned. As you can tell, the categories above have not fully crystallized.

*Unsupervised Learning has a number of levels** in it, such as ‘Find Features’, ‘What is the Question?’, ‘Why?’, etc…

**Not that everything is hierarchical, but this is convenient for discussion

***This is the ‘literature review’ portion of anything we do now

‘Machigne’

Aside from being an excellently cromulent word, ‘machigne’ is what I often type when attempting to type ‘machine’. It seems to be because the ‘g’ allows for all of the transitions between letters to go from left hand to right hand and back:

machine: l,r,l,r,l,l,r*
machigne: l,r,l,r,l,r,l,r

*Note that this was really difficult to type, as it involved using one of the weakest fingers for two consecutive characters (‘l’, ‘,’).

Class Divisions

There are many computer games out there which have or purport to give the player the fighter/mage/thief* experience. The canonical examples for me are ‘Quest for Glory’ (Sierra) and ‘Keef the Thief’, probably because they were the first ones I played in the genre.

Most of these games will have different skills you can use to overcome the various obstacles the games throw your way. I’m interested in looking at these skills, and seeing how much each of the games actually lets you play a fighter, mage, or thief, and also how much each of the skills falls under one or more of these categories. But for this, well need some definitions…

The mage/non-mage division is probably the easiest to define, good canonical examples are ‘Ars Magica’ and the ‘Might and Magic’ series, where there are various types of magic users various types of non-magic users.

Mages:
– Basically, a mage is someone who can do things that are outside of what a human could do at a medieval tech. level**.
– They also have some sort of internal power reserve which they use to perform these feats, a power reserve which recharges over time or when they rest. This power reserve is sometimes the same as ‘stamina’ (GURPS), ans sometimes not (D&D, TES, etc…)

‘Fighters’ and ‘Thieves’ have skills that one could conceivably acquire as a very well-trained human. The main difference is in the techniques used to solve problems.

Fighters:
– Tend to use very straight-forward methods to solve problems, often involving combat.
– Fighters will tend to have more combat skills and options than others

Thieves:
– Thieves tend to use more stealth, trying to find an adversary’s weak points, and using more non-combat skills, many of which have less than legal uses.
– Thieves will tend to have a wider variety of skills than others

There are also various skills which any ‘adventurer’ would require to get by in a fantasy world. Depending on the particular game and its game balance, these skills may fall under any one of the ‘classes’ above.

*I’m stepping somewhat away from the D&D Fighter/Mage/Cleric/Thief paradigm, but may revisit this in the future. There are a large number of games which merge all magic users into one, and that’s what I want to explore. Also, the idea of a separate class of ‘healers’ is an interesting concept/conceit, and it may be interesting to see how this is reflective of a society where people damage themselves all the time, and rely on one member of the group to heal them, rather than doing things in a more sustainable/mindful manner…

**’Tech. levels’ were first codified (that I saw) by GURPS: http://gurps.wikia.com/wiki/Tech_Level. Most fantasy-type games feel like between 2 and 3 on this scale. Game balance wrt different ‘magic spells’ and their resepective tech. levels is a whole different interesting topic.

Personal Character Classes

Around the internet, you will find many quizzes which purport to tell you which archetypical ‘character class’ you most belong to. As you would expect, many of these quizzes are clickbait, and even if they weren’t, it’s relatively unlikely that the authors would have taken the time to poll some ‘gold standard*’ group of people to a statistically significant degree.

I’ve been (very slowly) taking a different tack. The plan was to write a story written from the perspective of a character falling into each each of each of the archetypes, to see which one(s) spoke to me the most**,***.

The first installment, ‘Druid’ currently has two parts available here:

Druid

Barriers

*It does seem somewhat absurd to have a ‘gold standard’ of correctness for which fictional archetype one best fits into, but what can you do?

**The best analogy for this for me comes from the struggles of the protagonists in the Modesitt books ‘The Magic of Recluce’ and ‘The Magic Engineer’, where they say things out loud and see how their internal mental map/conscience twinges to see how true they are. Another analogy is presented by Paul Graham here: http://www.paulgraham.com/essay.html where he talks about ‘essays’ being trying out ideas in written form to see how well they work.

***Note that this does not get into issues of differences between what you feel as a person vs. what type of character you would play in a game.

Wikipedia:Categories

So, you may know that the English-language Wikipedia has more than 5 million pages, with 10 edits/s and about 800 new articles per day.

What you may or may not know about is the intense and detailed structure that has grown up inside Wikipedia. Consider the following page:

https://en.wikipedia.org/wiki/Category:Battles_involving_England


This category includes historical battles in which unified Kingdom of England (10th century–1707) participated. Please see the category guidelines for more information.

See Category:Battles involving the Britons and Category:Battles involving the Anglo-Saxons for earlier battles.
See Category:Battles involving the United Kingdom for later battles.

Subcategories

This category has the following 19 subcategories, out of 19 total.

Think about what this means. Given the propensity for people to argue*, there were probably discussions about all aspects of this, such as which were the appropriate progentior and successor states, what qualifies as a battle, how to group categories and sub-categories, and that’s even before you argue about any one specific article. Of the many reasons I love Wikipedia, possibly the most useful is its ability to direct human arguing into something more useful.

Note that this is one category, and as of today, there are 361703 categories and sub-categories in English Wikipedia.

Also, I learned that ‘Deira’ https://en.wikipedia.org/wiki/Deira was a real place, and possibly not just made up by David and Leigh Eddings. (Although, given how full namespaces currently are, it’s often difficult to know.)

*This is my favourite Wikipedia article: https://en.wikipedia.org/wiki/Wikipedia:Lamest_edit_wars

“That’s Not Funny!”

Scene: I’m in a conversation with two students, one male, one female, probably high school age.

The male student says: “How many feminists does it take to screw in a lightbulb?”
Me, not missing a beat: “That’s not funny!”
Female student: “Yes!”

For those of who don’t see the joke, “That’s not funny!” is often the punchline to “How many feminists does it take to screw in a lightbulb?, playing on the perception of the sense of humour of feminists and feminism*.

As it turns out, both parties took my comment at face value (which was mostly what I intended), and it turned into a small teachable moment.

*This feels like a whole long discussion, mostly sad, about how people felt that making ‘punching down**’ jokes about women no longer socially acceptable was a ‘bad thing’. I feel like much has been said about this, and I have nothing useful to add.

**Perhaps more interestingly, it feels like this whole concept was aired and discussed long before the words ‘punching down’ (meaning making fun of those less fortunate or less privileged) entered the vernacular.

Tenagra, on the Ocean

Pooh and Piglet at Tanagra
“Pooh?” said Piglet.
“Yes, Piglet?” said Pooh.
“Darmok and Jalad at Tanagra,” said Piglet.
“Shaka, when the walls fell.” said Pooh.
Pic by Cathy Wappel
Words by Michael G Munz

The above pic came across my fb feed this morning.

Some random thoughts about this.

1) There exists this subreddit: https://www.reddit.com/r/Tenagra/ which is, in the internet way, developing a similar-type language.

2)

TEXT OF COMIC:
Hi, Abby. How are you?
Spock’s response to his mother’s question at the end of The Voyage Home.
Huh? What’s up with your communication skills today?
The aliens in “Darmok and Jalad.”
You’re… communicating only in obscure references to Star Trek.
Decker’s answer to Kirk saying “You saved the ship” in The Motion Picture.
And WHY exactly are you doing this?
The 74th Rule of Acquisition.
“Knowledge equals profit?” Okay, what the heck are you trying to build your knowledge for?
Kirk’s exclamation after Spock’s death.
Oh, that’s right. You’re going to a con.
MOUSEOVER TEXT: whenever I want to get laid, I just tell John ‘Spock in Amok Time’ and he knows EXACTLY what I mean
http://www.johnanderikaspeak.com/an/2012/05/12/1168/

3) The title of the post is somewhat ambiguous. It could be a reference to Dylan Thomas’ ‘A grief ago’, in its use of parts of speech, saying something deep about Tenagra, and the myths behind it leaving us behind on the seas of fleeting cultural memory… Or it could just be commenting that Tenagra was an island.

If you don’t understand the reference, this might help:
https://en.wikipedia.org/wiki/Darmok

Regulatory Capture and NP-Completeness

A common conversation in our household:

S: “So, why do you think happens?”
Me: “Regulatory Capture.”
S: “Oh. Right. :(”

I like to sort this with my general wont to cut Gordian knots*, but perhaps it is also useful because it allows you to reduce the problem under discussion to a well-known problem, which is known to be insoluble in very specific ways.

Perhaps we will find that the solution to the P=?NP problem is the same as the problem of regulatory capture**. I think it’s more likely that P=?NP will be solved first.

*Or to be a troll.

**Regulatory capture I think was first described for me in ‘Yes, Prime Minister’, as the inevitable co-option of the body which regulates an industry by the industry which it regulates. This is generally because they talk to each other the most, combined with the huge financial incentives.