Category Archives: Thoughts on Thoughts

Wikipedia Humour

My favourite page on Wikipedia is the description of the ‘Lamest Edit Wars’

https://en.wikipedia.org/wiki/Wikipedia:Lamest_edit_wars

Wikipedia is a treasure trove of (very) dry humour about often very controversial topics. Normally, if you’re in a conversation about a controversial topic, you can step out, but not an encyclopedia, which is expected to have words on everything.

You can see the workshopping that must have gone into it. I wonder if there are ways to detect the most workshopped phrases? To detect the ‘most controversial*’ parts of Wikipedia? (Although parsing the revision history may give you this.)

“Both frequencies coexist today (Japan uses both) with no great technical reason to prefer one over the other[1] and no apparent desire for complete worldwide standardization.”

https://en.wikipedia.org/wiki/Utility_frequency

If you enjoyed the humour above, you may also enjoy (hattip to AM):

https://github.com/bup/bup#things-that-are-stupid-for-now-but-which-well-fix-later

And my favourite subreddit of them all:
https://www.reddit.com/r/notinteresting

It is truly sublime, including such gems as:
“checking the radiator pipe cover”

radiator

*It turns out that Wikipedia has a list of these (of course it does): https://en.wikipedia.org/wiki/Wikipedia:List_of_controversial_issues
This is different from controversies about Wikipedia: https://en.wikipedia.org/wiki/List_of_Wikipedia_controversies
There are even articles in reputable news sources written about this: http://www.cnn.com/2013/07/24/tech/web/controversial-wikipedia-pages/ And research papers: http://arxiv.org/vc/arxiv/papers/1305/1305.5566v1.pdf

TNG: What Might Have Been

So, we were watching the TNG episode ‘Reunion’ http://memory-alpha.wikia.com/wiki/Reunion_%28episode%29, and it got us thinking about what TNG might have been.

K’Ehleyr was such a big and interesting and *alive* character. Imagine if she had been a cast regular. The place she seemed to fit best in our mind was replacing Riker as the Enterprise First Officer.

First, a bit of backstory to set the stage:

It’s often been said that the original Star Trek was based around a ‘Freudian Trio’ of the Ego (Kirk), the Superego (Spock), and the Id (McCoy).
http://tvtropes.org/pmwiki/pmwiki.php/Main/FreudianTrio

Gene Roddenberry’s vision for TNG was that humans would have evolved to no longer ‘need’ the interpersonal bickering which characterized the Spock/McCoy interaction. Some say that this led to ‘too safe’ personal interactions amongst the crew, with the only sources of conflict being Worf’s conservatism, Riker’s devil’s advocate, and the management/engineering interaction* between Picard and Geordi.

This made the writers need to look outside the main cast for sources of conflict. This generally worked well, but wasn’t Wesley’s best performance in the series when he played the sulky teenager being called on the carpet by Picard?

All this is a long winded way of saying that it could have been a very different series with a more varied and emotionally expressive cast.

Back to Susie Plakson as K’Ehleyr as First Officer. You would have a very different take on the ‘Freudian Trio’, with the calm and rational emotional readings from Troi, and the more aggressive emotions from K’Ehleyr, with Picard bringing it all together. There’s a beautiful scene with K’Ehleyr and Troi talking just after K’ehleyr has broken a glass table in anger. So much interesting emotional depth to discover and explore!

Also, you’d have the fun dynamic between K’Ehleyr and Worf, with her as his superior officer, much more interesting than the never-really-explored-outside-of-the-book-Imzadi relationship between Riker and Troi.

But alas, TNG was a product of its time and executives. Riker with his daddy issues (which are important, and he carried the part well) must have spoken to those casting, and it must have not just been because he had the second highest rank on the ship that he got second billing, above all the ‘supporting cast’.

Also, the two women who were most like what we’re suggesting for K’Ehleyr were both written out of the show after the first season, both because they wanted more from their parts on the show. Denise Crosby left to pursue feature films, and Gates McFadden was pushed out because she was insisting on more substantive parts for her character.

It wouldn’t be until Kira Nerys that we would have a character close to what could have been with K’Ehleyr. Maybe in a Mirror Universe someday…

*It’s actually really fun to watch this, especially in the early episodes, where they have a number of classic ‘management/engineering’ conversations, including such gems as ‘I don’t want you to use the word impossible’.

Other interesting notes:

Apparently, ‘Wesley Crusher’ was almost ‘Lesley Crusher’: http://trekmovie.com/2010/08/26/1987-paramount-memo-reveals-actors-auditioning-for-star-trek-tng-cast/

Pages 293-7
http://www.amazon.com/The-Continuing-Mission-Star-Trek/dp/0671025597#reader_0671025597
Interesting notes include the fact that each of the actors had to pass personal interviews with the studio execs, that Marina Sirtis and Denise Crosby were originally cast in the opposite parts, and that Gene had to be convinced at length to choose Patrick Stewart.

Touch Typing

It’s the little things that you notice. I was writing something, and just happened to notice that I was looking off into the distance while I was typing. It was one of those choices I made when I was very young. I was in High School, our school didn’t have a typing class, and I decided I needed to learn how. I don’t even remember why. It might have been my mom’s stories about learning, with those typewriters with no letters on the keys, when she was growing up.

Anyways, I remember taking one course, one of those summer enrichment things, up at Northern Secondary. I seem to recall I also took magic, stained glass, and board games, but those might have been different years. (Come to think of it, it might even have been before high school…) Interestingly, I remember this being my choice, perhaps an odd choice for a 12 year old. I don’t even remember why I thought it would be useful, but I remember acutely that I knew it would be. Perhaps similar to my choice to pursue chemical engineering over computers, as I knew that no matter what I did, I would be using computers.

I remember taking that one course, and it being fun…They had these cool puzzles where they gave you a sequence of commands to type, making simple versions of what I could only find online as ‘typewriter art’: https://www.google.com/search?hl=en&site=imghp&tbm=isch&source=hp&biw=1250&bih=694&q=typewriter+art

(Kind of early ASCII art, I wonder how much crossed over…)

In searching for the above, I found:

http://www.rapidtyping.com/online-typing-games/isogram-puzzle.html

It’s Mastermind, but with words! πŸ˜€

Which apparently has also been published:

https://boardgamegeek.com/boardgame/5662/word-mastermind

In a couple of different forms:
https://boardgamegeek.com/image/1029413/word-mastermind
https://boardgamegeek.com/image/1151419/word-mastermind

Anyways, I took these classes, but I don’t remember really using my typing until we had an email group in undergrad called the ‘Mailstrom’, often hitting 3 digits of messages per day, where quick wit (and quicker typing) was key.

I suspect there was also some training from playing computer games, but that would really only train a few keys (mostly ctrl and alt, from that era), and the mental mapping probably wouldn’t be from the hand motion to the letter.

And right now, I’m touch typing this, and it seems so normal/natural. Such a weird skill. Happy typing! πŸ˜€

Each Person is Their Own Country

I was in London during the summer of 2000, and one of the expats I met there described the inhabitants as “Each person is their own country”. This was their way of describing how the inhabitants of London (didn’t) interact with each other.

My experience there then was similar, with the only friends I made were other travelers, people from small towns, expats, and a most excellent MSF gentleman from Germany. I also had an experience I regret at the Church of Scientology, but we will speak no more of that.

More relevantly, we were talking at lunch today about large agglomerations of people vs. small towns, and wondering if there is something inherent to large cities that makes people colder or more distant.

AM suggested that it the interactions you would expect in a small town, acknowledging each other as you walk down the street simply become impractical when you encounter thousands of people each day. It’s also possible that people become more and more indistinguishable once there are so many of them, that it becomes a blur, and your mind automatically groups them or filters them out, as they’re too close to the average of ‘how much do I need to pay attention to this person today’. People whom you have befriended, family, co-workers all fall outside this category, but you can even see some of the effects of this if you’re working in a large organization of tens of thousands of people. Your brain will automatically take shortcuts, and group people, whether you want to or not; you have to actively fight this if you want to think of all of them as individuals.

Other possibilities include concerns for safety, concerns that the only reason people approach you on the street is to ask for money or to save your immortal soul, or just that the brain is set up to see 100-200 people as ‘your tribe’, and all others become NPCs*. Once again, this is something you have to fight against, or train your brain out of doing.

Finding “The conversation I can only have with you” can be non-trivial when your brain is full.

But still worth it. πŸ˜€

*https://en.wikipedia.org/wiki/Non-player_character

The Six Answers to a ‘Yes or No’ Question

There exist the traditional five answers to a ‘Yes or No’ question:

– ‘Yes’, indicating complete agreement
– ‘No’, indicating complete disagreement
– ‘Maybe’, indicating something in between on that axis
– ‘I don’t know’, indicating a lack of relevant information
– ‘Mu*’, or ‘unask the question, it contains an incorrect assumption’

Recently, J EB (nee K) mentioned that ‘like’ is a new answer to a yes/no question. (On my post ‘No Spoilers Awaken’)

The Facebook ‘like’ seems to mean a number of different, sometimes overlapping things…
– ‘I like this post and I want you to know’
– ‘I agree with you’
– ‘I’m curious to hear the answer to this question’
– ‘I support you’
– ‘I understand your feelings’

It is very clear (to me) that ‘like’ is a valid answer to a ‘Yes or No’ question, and it is most delightfully ambiguous. It feels more discovered than invented, as we’ve always had ‘interesting question’, it was just rarely expressed by random people around the world, in response to a conversation they are not explicitly a part of.

*For those who wish a slightly more formal treatment: https://en.wikipedia.org/wiki/Mu_%28negative%29 specifically: https://en.wikipedia.org/wiki/Mu_%28negative%29#.22Unasking.22_the_question

You may also be interested in the somewhat related: https://en.wikipedia.org/wiki/Many-valued_logic

But my favourite is probably: https://en.wikipedia.org/wiki/Not_even_wrong (Thanks DJ!) This is one way to say ‘Mu’, but usually only if you’re trying to be insulting.

Similar but not Identical

How do you make a a playlist of songs which are similar, but not identical. Ideally, you want to play music that the user is likely to want to listen to*, but you probably don’t want to play the same song, even in different remixes over and over. So, how do you detect similarities, while removing identicals, even when they may not be so identical?

In practice, there is probably a lot of separation between the spike of identical songs and those that are merely similar. You could also use the Web 2.0 crutch of looking at what people searched after other songs, and/or the machine learning approach of trying to put songs after one another and seeing what people skipped or turned to from the suggestions instead.

Similarly**, cleaning data of artifacts is still an open problem. It feels to me like a similar one. You’re trying to remove this *huge* signal which is overwhelming your sensors so you can get at what you actually care about. Assuming both the artifacts and the signal are within your detection limit***, you have to determine the nature of the artifact, both where it is in the signal spectrum, and what axes it spreads through and how. It might also have related harmonics****.

Another related problem is the removal of 60Hz***** noise from all sorts of electronics. I’m not sure what sorts of filters are used, but even band reject filters have non-ideal behaviour, so perhaps smoothing the edges in a known way works better, but this is all speculation. I mostly like using the field around power cords to test oscilloscopes and to get people to think about electric fields.

But back to artifact removal. I don’t have particular insights right now, outside of specific problems spaces. I just think it would be a really cool problem to work on (and one that people work on in a specific way all the time).

*Or perhaps something just similar enough that you’ve been paid enough to play.

**But not identically,

***My favourite procedure/process is the one I learned from an analytical chemist, which is that the signal has to be 3x the noise for you to consider it signal.

****I’m using signal processing as an analogy, but the concept is the same for other artifact removal, just different math.

*****50Hz across the pond

Solution Rotation

So, sometimes when someone asks me a question, I feel like I’m rotating through a number of possible solutions/solution types, like rotating through different options in a leather punch. https://www.google.com/search?tbm=isch&q=leather+punch

I first noticed this in a conversation with Garland Marshall, one of my favourite profs. at WashU: https://biochem.wustl.edu/faculty/faculty/garland-marshall. He’d asked me a question about how one would determine the structure of a binding site of a molecule too difficult to crystallize, too large to NMR, and impossible to get a structure with a bound ligand.

How do you come up with the structure of the binding site? I remember rotating through a number of different options, mostly focused on polling the ligand in various ways.

– Does this happen to other people?
– Is there a neuronal definition/description of this?
– What does this mean?
– Other types of analogies?

On the ‘neuronal pathway’ front, it could be something like activating different pathways in sequence, doing it manually, rather than letting your brain activate all of them at the same time, then aggressively pruning them (to save energy). So, you would actively control your thoughts, to try out each channel independently, and submit them to more rigorous logic, to make sure you hadn’t left anything out. Somewhat like taking the ‘mental shackles off’, asking an audience ‘what ideas would your most creative and silly friend have about what to do with a brick?’, rather than ‘what ideas would you have about what to do with a brick*?’

*This seems to have been adapted from the Torrance Tests of Creative Thinking https://en.wikipedia.org/wiki/Torrance_Tests_of_Creative_Thinking

Problem Solving Examples (With some Machine Learning)

So, in a previous post, (http://nayrb.org/~blog/2015/12/25/automation-and-machine-learning/), we talked about some methods to help you decide whether you actually needed Machine Learning or not to solve your problem. This post talks about some various different problem solving approaches and which types of problems they can make tractable.

I started my career fascinated by protein folding and protein design. By the time I got there, they had narrowed the question down to one of search: ‘Given this physics-based scoring function, how do I find the optimal configuration of this molecule’? There were a number of different techniques they were using: gradient descent, monte carlo, simulated annealing, but they all boiled down to finding the optimal solution to an NP-Complete problem.

As we know that biological systems can perform protein folding quickly, there must be some algorithm which can do this (even if it means simulating each individual electron). This can then be restated as a simulation/decision question, from the perspective of a cell/physics. Many other search problems have similar human-like or physics-like easier solutions (ways of finding the NP-Complete verifier). For example, as a traveling salesperson, you would look at the map, and be able to narrow down the routes to some smaller number, or be able to quickly narrow down the options to a small number of sets of routes.

In many ways, this is the ‘holy grail’ of Machine Learning, the ability for a machine to step away from what we tell it, and to be able to solve the problem in a more direct way. Heuristics are an attempt to solve this problem, but they’re always somewhat rules-based.

Next is clustering, best used for differentiating between different groups of things so that you can make a decision. My favourite is ‘Flow Cytometry’ https://en.wikipedia.org/wiki/Flow_cytometry, where you’re trying to differentiate different groups of cells, basically through clustering on a 2-D graph of the brightness of various fluorescent cell markers.

Customer persona clustering is another example, such as you might do for segmentation, where standard groups like age or location would not be good enough.

Machine Learning problems such as the Netflix challenge http://www.netflixprize.com/, where you want a large degree of accuracy in your answer, require the use of a number of techniques. (The problem was to take a list of customer movie ratings and predict how those customers would rate other movies.)

First, you need to clean and normalize the data. The authors were also able to separate the general opinion of each movie from the specific opinion each person had about each movie. (Each of these was about as important to the overall result.) Each of these normalizations or bias removals would likely have been done with some form of machine learning, suggesting that any comprehensive usage would require multiple pipelines or channels, probably directed by some master channels* learning from which of them were the most effective.

I wonder how much of what we do as humans involves breaking down the problem, to divide and conquer. When we’re asked for a movie recommendation, do we think of good movies first, then what that person would think of? Personally, I feel I get my best results when I try to put myself in that person’s shoes, suggesting there may be a long way still to go.

Perhaps looking at groups of movies, or some sort of tagging, to get at whatever ‘genes’ may be underneath, as you may like certain things about movies which are only imperfectly captured by how people like them similarly. (Or perhaps, the data is big enough to capture all of this. It’s fun to speculate. πŸ˜€ )

*This suggests a hierarchy, which is only one way of seeing the structure. Other views are possible, but outside the scope.

Automation and Machine Learning

When we ask a computer for help with a task, what are we asking for?

1) Help with automating a repetitive task
2) Help with a decision

1) Help with automating a repetitive task
There are various ways you can automate a repetitive task. You can:
a) Ask your computer to do the same thing again and again, regardless of input (display the home page)
b) Give it some simple rules to follow (if they try to navigate to a non-existent page, show them a 404)
c) Give it some complex or not fully understood rules to follow (based on our tests, these are the solutions you should attempt, in this order)
d) Give it inputs, and have it adapt (‘Watch me perform this industrial assembly task, now you do it’)

2) Help with a decision
There are various different ways you can use a computer to help with a decision. You can:
a) Display data in various interesting ways (Data Visualization)
b) Give it the data and some rules to follow (Standard decision automation)
c) Give it the data and a desired output/scoring function (Supervised/Reinforcement Learning)
d) Give it the data and nothing else* (Unsupervised Learning)

This is somewhat of a false dichotomy, as adding new types of decisions allows more and more automation.

– Search (inputting words, pictures, video into a search engine and asking for a result) generally started with 2.a) (Data Display), and seems to be trying to move up the decision hierarchy, anticipating questions and the rules the user would want it to follow. This seems to be generally done with statistics, but I expect this would be switching over to pattern-finding neural nets
– Clustering (throwing a bunch of data into the hopper and getting groupings back) is also mostly in the Data Visualization bucket. It could also be an input into a machine learning algorithm, which would then be trained to make decisions based on these clusters
– Machine Learning (giving a bunch of data and getting a decision or pattern out) can be used for most or all of the options above, and similar to how computers have gotten ‘fast enough’, Machine Learning is becoming ‘good enough’ or ‘easy enough’ to replace many of the above.

So, as a human, when do you choose each of these? Assuming the options get more difficult going down the list, you would:

1) Start by googling various things (mostly to see what has been done before***).
2) You could then look at the data, clean it, and try clustering it into groups, to see if any of them made sense for the decision you wanted to make.
3) If neither of these worked, or if you wanted more, you could derive a scoring function for the output you wanted, then supply a Machine Learning algorithm with a substantial amount of data, and see how optimal it could make the decision.
4) If you don’t even know what decision you want, or are having difficulty making a scoring function, you could throw the data into an unsupervised learning hopper and see what comes out.

At each of the steps above, you can hive off parts and automate them, either using rules derived from the patterns you’ve found, or using flexible rules from the Machine Learning algorithm. You may find you can accomplish most of your task without having to resort to complex or incompletely understood algorithms.

More examples in subsequent posts. Stay tuned. As you can tell, the categories above have not fully crystallized.

*Unsupervised Learning has a number of levels** in it, such as ‘Find Features’, ‘What is the Question?’, ‘Why?’, etc…

**Not that everything is hierarchical, but this is convenient for discussion

***This is the ‘literature review’ portion of anything we do now

Wikipedia:Categories

So, you may know that the English-language Wikipedia has more than 5 million pages, with 10 edits/s and about 800 new articles per day.

What you may or may not know about is the intense and detailed structure that has grown up inside Wikipedia. Consider the following page:

https://en.wikipedia.org/wiki/Category:Battles_involving_England


This category includes historical battles in which unified Kingdom of England (10th century–1707) participated. Please see the category guidelines for more information.

See Category:Battles involving the Britons and Category:Battles involving the Anglo-Saxons for earlier battles.
See Category:Battles involving the United Kingdom for later battles.

Subcategories

This category has the following 19 subcategories, out of 19 total.

Think about what this means. Given the propensity for people to argue*, there were probably discussions about all aspects of this, such as which were the appropriate progentior and successor states, what qualifies as a battle, how to group categories and sub-categories, and that’s even before you argue about any one specific article. Of the many reasons I love Wikipedia, possibly the most useful is its ability to direct human arguing into something more useful.

Note that this is one category, and as of today, there are 361703 categories and sub-categories in English Wikipedia.

Also, I learned that ‘Deira’ https://en.wikipedia.org/wiki/Deira was a real place, and possibly not just made up by David and Leigh Eddings. (Although, given how full namespaces currently are, it’s often difficult to know.)

*This is my favourite Wikipedia article: https://en.wikipedia.org/wiki/Wikipedia:Lamest_edit_wars