Similar but not Identical

How do you make a a playlist of songs which are similar, but not identical. Ideally, you want to play music that the user is likely to want to listen to*, but you probably don’t want to play the same song, even in different remixes over and over. So, how do you detect similarities, while removing identicals, even when they may not be so identical?

In practice, there is probably a lot of separation between the spike of identical songs and those that are merely similar. You could also use the Web 2.0 crutch of looking at what people searched after other songs, and/or the machine learning approach of trying to put songs after one another and seeing what people skipped or turned to from the suggestions instead.

Similarly**, cleaning data of artifacts is still an open problem. It feels to me like a similar one. You’re trying to remove this *huge* signal which is overwhelming your sensors so you can get at what you actually care about. Assuming both the artifacts and the signal are within your detection limit***, you have to determine the nature of the artifact, both where it is in the signal spectrum, and what axes it spreads through and how. It might also have related harmonics****.

Another related problem is the removal of 60Hz***** noise from all sorts of electronics. I’m not sure what sorts of filters are used, but even band reject filters have non-ideal behaviour, so perhaps smoothing the edges in a known way works better, but this is all speculation. I mostly like using the field around power cords to test oscilloscopes and to get people to think about electric fields.

But back to artifact removal. I don’t have particular insights right now, outside of specific problems spaces. I just think it would be a really cool problem to work on (and one that people work on in a specific way all the time).

*Or perhaps something just similar enough that you’ve been paid enough to play.

**But not identically,

***My favourite procedure/process is the one I learned from an analytical chemist, which is that the signal has to be 3x the noise for you to consider it signal.

****I’m using signal processing as an analogy, but the concept is the same for other artifact removal, just different math.

*****50Hz across the pond

The Spoilers Become More Awake

Earlier, I talked a little about fear and redemption in The Force Awakens:

The Spoilers Awaken

This post is more a bunch of scattered thoughts…

The movie was all about Han Solo, and that was a good thing. Harrison Ford has really matured as an actor (I should see how he is in American Graffiti), where you see the gravitas, which smoothed out the ‘scruffy-haired nerf-herder*’

There’s probably something about having actors of varying ages and maturity levels, and how it smooths things out. (Even though the young actors in this movie are more skilled (or better directed), they still have the very young energy, attractiveness, and rushing intensity, all of which can do better with guidance…)

‘Droids’ is an excellent example of good ‘in universe’ lingo**.

Seeing the characters old and the death of Han Solo was not just the passing of the torch to the next generation of Star Wars, but also perhaps a passing of the torch to us, that it’s time for us to step up (similar to when Jack Layton died)…

Leia’s dress with a New Republic neck was a nice touch.

Some people have said that Leia was not the most convincing actor, but her acting worked fine for me. Her scenes with Han were very touching, along with the scene near the end with Rey. I also found her convincing as a general, who ‘went back to what she knew the best’, and seemed to fit well in that role.

In a galaxy with hyperdrive and even reasonable astronomy and astrogation, how could you not tell where a sector was, if there was a map of it that included 5-10% of the galaxy? Even with 300 billion stars in a galaxy, you wouldn’t need very many to be narrow down a sector, if the map had any reasonable level of accuracy…

So much regret for time past with problems remaining unresolved…Like Tron:Legacy…

Good use of X-wing quad lasers in ground combat against stormtroopers (apparently they added an under-blaster-cannon in the updated model for the movie), similar to R2-D2’s method for dealing with Joruus C’boath (even a jedi master cannot deflect startfighter-sized weapons, and/or they cannot predict what droids will do). Also, I liked the new X-wing colours. Apparently the shape is slightly different, but I didn’t notice that. http://starwars.wikia.com/wiki/T-65B_X-wing_starfighter#Behind_the_scenes

It was very fitting that the new death star reformed back into a sun…

The art department had many scenes of groups of aliens, just doing their thing, ‘world building’ as S says.

The establishing shots were really well done (you should do Comic Book Boot Camp http://comicbookbootcamp.com/).

The force continues to be weak in dealing with droids…The light side of the force more often appears with empathy, so they they can use that to interact with droids

A very tech-savvy force user…Anakin, perhaps Luke, for sure Rey…Either a force ability, or something about growing up on desert planets. If it’s a force ability, interesting that it allows much easier repairs and jury rigging, but not sensing or understanding the motivations of droids.

A small complaint about Cineplex showing spoilers in the opening ‘pre-movie games’

Also, the imperials just sound better with English accents.

Interesting the ‘order’ vs. ‘freedom’ contrast between ‘The First Order’ and ‘The Resistance’.

*Similar to how the last few vestiges of Garath the thief were the only differences between Belgarath the Sorceror and Aldur…

**The counterexample I always use is ‘Argonians’ and ‘Kajhit’ in Oblivion, where no matter how racist the character, they always used the official names, which I always found jarring and unrealistic.

New Year, New Beginnings

2016. It feels like the future. As I write this, it is. I feel like this is the first time I’m really conscious of what I want to do next (at least from the perspective of having the ability to have that decision).

When this is published, this will be #8 in a row of daily posts. My goal is to keep this up indefinitely. I also have other resolutions which need a little more work before I share them. Anything you want me to write about this year? Comments below!

Agile Basics

Disclaimer: I currently work for a company. That company does Agile. From my limited experience, I think it does it well. I am not talking about that company in any way, shape, or form in this post.

I don’t recall when I first heard about Agile software development. I probably heard about it from Slashdot, when I was still reading it during grad. school.

First up, the Manifesto itself:

From: http://agilemanifesto.org/


We are uncovering better ways of developing
software by doing it and helping others do it.
Through this work we have come to value:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on
the right, we value the items on the left more.

Overall, it feels like a very human approach to software development. I’ve always enjoyed speculating about the situations which may have led to rules being formed*.

It also feels like it was written by a group of people who were actually interested in solving problems, perhaps by cutting through Gordian Knots of rigid process and planning.

Their conclusion was that problems will occur, things will change, and you want your process to accommodate that as much as possible, to enable and encourage people to talk about these earlier and more effectively. More of a ‘getting to yes’, rather than rear-covering or posturing.

Now, they further broke down the above four statements into 12 principles:

http://agilemanifesto.org/principles.html

1) “Our highest priority is to satisfy the customer
through early and continuous delivery
of valuable software.”

This one seems pretty self-evident, but there are a surprising number of people who have job descriptions at odds or orthogonal to this, especially as organizations get larger.

2) “Welcome changing requirements, even late in
development. Agile processes harness change for
the customer’s competitive advantage.”

This might be the toughest (it was for me, and I consider myself good at this), as humans are naturally lazy and resistant to change.

Evolutionary Psychology : Laziness

I wonder if this can only be overcome through experience, by knowing how much more pain will happen if a particular shortcut is taken or some important stakeholder is ignored. (Even knowing that someone should be listened to instead of letting your lazy brain edit them out can be difficult.) Either way, a good argument for having at least one person (that you listen to!) with experience on the team.

But if you’re having requirements changing too often, your stories are likely not ready before you start, or they are too large, leading you to:

3) “Deliver working software frequently, from a
couple of weeks to a couple of months, with a
preference to the shorter timescale.”

‘Fail early, fail often’ goes the quote. Coupled with 2), it becomes much more difficult to be working on the wrong thing when you’re allowing for updated requirements every time you start a new short story**.

4) “Business people and developers must work
together daily throughout the project.”

I feel like the people who put this list together had experienced a lot of communication problems where they worked. This one is really helpful. There are few things more annoying than being ready to work on something but not being able to reach the person who needs to make the decision.

5) “Build projects around motivated individuals.
Give them the environment and support they need,
and trust them to get the job done.”

From the people I’ve talked to, this is every programmer’s dream. All the best working environments I’ve been in have been like this.

6) “The most efficient and effective method of
conveying information to and within a development
team is face-to-face conversation.”

Yes. And it drops off considerably, even to ‘face-to-face’ Skype/

7) “Working software is the primary measure of progress.”

Many people have different measures of progress, leading to organizational misalignment.

8) “Agile processes promote sustainable development.
The sponsors, developers, and users should be able
to maintain a constant pace indefinitely.”

As much fun as ‘feast and famine’ is, you’re probably not doing your best work souped up on adrenaline and coffee. (Or maybe you are. If so, you should take some time off between binges.)

9) “Continuous attention to technical excellence
and good design enhances agility.”

‘We put brakes on the car so that it can go faster.’ ‘Legacy code is defined as any code with inadequate test coverage.’ All the time you spend hesitating*** because you’re worried about breaking old crappy code is time you’re not building features or refactoring old crappy code.

10) “Simplicity–the art of maximizing the amount
of work not done–is essential.”

Think Apple. Think Oblivion when they decided to voice all of the dialog, and all the streamlining that inspired/required. Think about that super-expert programmer you know who can tell you why every single line of code they wrote is there, and also why everything not there is not there.

11) “The best architectures, requirements, and designs
emerge from self-organizing teams.”

Ask the people who know the most about something to make the decisions****.

12) “At regular intervals, the team reflects on how
to become more effective, then tunes and adjusts
its behavior accordingly. ”

Continuous improvement. Read ‘The Goal*****’. Really. It will improve your life. Also retrospectives.

*Similar to reading Wikipedia and trying to figure out the actual events behind the dry descriptions…

**I still remember the day when I felt I had ‘graduated to short stories’, as they tend to pack more large ideas per page. Longer novels tend to be more meditative/escapatory?

***And working around crappy code…

****I’m currently most comfortable with some mix of Scrum and Kanban. They have a specific separation of powers between the team (handles estimation) and the product owner (handles prioritization). To me, this seems totally reasonable, but re-reading the manifesto above, there are many levels of Agile actualization above that. (Think Valve.) (Also, the product owner is part of the team, just with a specific role to play in addition to the general team tasks.)

*****https://en.wikipedia.org/wiki/The_Goal_%28novel%29 My favourite business book. Told in a novel format. Some of the story has dated references (before most of feminism), but is still very useful.

The Spoilers Awaken

This is my second post on The Force Awakens, this time with spoilers… (If you don’t want spoilers, you should go to the other post here: http://nayrb.org/~blog/?p=553)

From Anakin to Luke to Kylo Ren, the Star Wars movies are about the failed teaching of apprentices. It felt very poignant seeing the older Mark Hamil, with a beard, almost an echo of Alec Guinness’ haunted eyes. This suggests that movies VIII and IX may be the story of Luke Skywalker’s redemption as much as they may be Kylo Ren’s (in the same way that IV was Obi-Wan Kenobi’s redemption).

At the same time, it still feels like Luke ran away, at least it seems that way not knowing what has happened in the intervening time… Even if one of his students ran away and fell to the dark side, why would he not try again? What would make him flee that responsibility so resoundingly? Was it because he let his sister and best friends’ son fall to the dark side and kill all of his students?

Obi-Wan had more of an excuse, as the entire empire was after him, if he had stuck his head up, they would have sent out squads to kill him. But he had his redemption when he faced his fear/failed student.

โ€œFear is the path to the dark side. Fear leads to anger. Anger leads to hate. Hate leads to suffering.โ€ -Yoda

So much fear on both their parts…You could say that the fear of Obi-Wan and Luke of their failed teaching and students led to much of the conflict in all seven movies so far…Who will break the cycle, to help adolescents actually grow up properly? (Or is this an endless part of the human condition?)

(It could also be, like Ty Templeton taught us in Comic Book Boot Camp http://comicbookbootcamp.com/, how you want to torture your heroes, to give them more depth, to give them more complex motivations, and in Star Wars, it’s often mistakes they’ve made in the past that they want to redeem.)

Speaking of redemption, it made Han Solo a much more interesting character to have him needing redemption for his perceived failures with his son. Also, this may be me projecting or reading things in, but it felt like there was some Harrison Ford wanting redemption for his terrible acting in the original trilogy. (Just after I wrote that, I read an article talking about how he re-wrote much of Han’s terrible dialogue to be more in tune with the character, and apparently also wanted Han Solo to die at the end of Jedi, to give the movie a ‘bottom’. So maybe it was the drama of Han Solo that he was trying to redeem, to finally give him some gravitas.) (Also, given how much he apparently put into the part, I feel bad complaining about his acting…Maybe it’s just that many actors are not that good at that age, or that directing has improved (or that George Lucas was much better at the art and setting than at script writing or directing actors*), or that he was being compared to Alec Guinness and Peter Cushing.)

Either way, fear and redemption is the catch phrase of Star Wars. Comment below!

*Apparently, Harrison Ford also contributed a lot to his character in American Graffiti, which was successful for George Lucas in part because he set the scene well with the setting and music from the period.

No Spoilers Awaken

Well, we just saw The Force Awakens, and it was really good. I think I can say ‘great’, possibly the best of all of them.

I’ve promised no spoilers for this review, so I’ll be focusing on other aspects.

First, the music was seamless and brilliant. It carried the mood superbly, through all of the lonely scenes, the poignant scenes, the battle scenes that Star Wars is known for.

Second, the movie felt really tight. It was paced well, it felt like it moved all the way through, that all the scenes needed to be there.

There were many sendups, but they weren’t jarring, they felt natural to the characters saying them (like the conversations about aging in The Undiscovered Country).

It felt very true to the feel of the Star Wars universe, while at the same time, being a great movie.

Also, I feel like Harrison Ford was much better, like he’s grown into himself.

In short, go see this film.

Cineplex: 100 Years

This trailer, “Cineplex – 100 Years of Movies” which currently shows at the beginning of all Cineplex movies (at least, the ones I go to), always makes me tear up:

There’s something about the nostalgia, the ‘humans trying so hard, with whatever they had at the time, all sharing the dream of flying’, going from the first tentative flights, to biplanes, to the first propeller planes where it was important enough to have retractable landing gear, to the first jets, to fighter jets, to the Space Shuttle*, to some type of Interstellar-like FTL ship.

It’s the “we’ve been here, helping you tell stories all the way through this, and we’ll still be here, helping you tell stories when we reach the stars.”

Even though multiple iterations of the planes are fighter planes, there is no violence in the trailer, and it feels very hopeful.

“If we do not destroy ourselves, we will one day venture to the stars.”

*Probably not quite on a trajectory the Shuttle would take, but that’s reasonable dramatic license.

Solution Rotation

So, sometimes when someone asks me a question, I feel like I’m rotating through a number of possible solutions/solution types, like rotating through different options in a leather punch. https://www.google.com/search?tbm=isch&q=leather+punch

I first noticed this in a conversation with Garland Marshall, one of my favourite profs. at WashU: https://biochem.wustl.edu/faculty/faculty/garland-marshall. He’d asked me a question about how one would determine the structure of a binding site of a molecule too difficult to crystallize, too large to NMR, and impossible to get a structure with a bound ligand.

How do you come up with the structure of the binding site? I remember rotating through a number of different options, mostly focused on polling the ligand in various ways.

– Does this happen to other people?
– Is there a neuronal definition/description of this?
– What does this mean?
– Other types of analogies?

On the ‘neuronal pathway’ front, it could be something like activating different pathways in sequence, doing it manually, rather than letting your brain activate all of them at the same time, then aggressively pruning them (to save energy). So, you would actively control your thoughts, to try out each channel independently, and submit them to more rigorous logic, to make sure you hadn’t left anything out. Somewhat like taking the ‘mental shackles off’, asking an audience ‘what ideas would your most creative and silly friend have about what to do with a brick?’, rather than ‘what ideas would you have about what to do with a brick*?’

*This seems to have been adapted from the Torrance Tests of Creative Thinking https://en.wikipedia.org/wiki/Torrance_Tests_of_Creative_Thinking

Problem Solving Examples (With some Machine Learning)

So, in a previous post, (http://nayrb.org/~blog/2015/12/25/automation-and-machine-learning/), we talked about some methods to help you decide whether you actually needed Machine Learning or not to solve your problem. This post talks about some various different problem solving approaches and which types of problems they can make tractable.

I started my career fascinated by protein folding and protein design. By the time I got there, they had narrowed the question down to one of search: ‘Given this physics-based scoring function, how do I find the optimal configuration of this molecule’? There were a number of different techniques they were using: gradient descent, monte carlo, simulated annealing, but they all boiled down to finding the optimal solution to an NP-Complete problem.

As we know that biological systems can perform protein folding quickly, there must be some algorithm which can do this (even if it means simulating each individual electron). This can then be restated as a simulation/decision question, from the perspective of a cell/physics. Many other search problems have similar human-like or physics-like easier solutions (ways of finding the NP-Complete verifier). For example, as a traveling salesperson, you would look at the map, and be able to narrow down the routes to some smaller number, or be able to quickly narrow down the options to a small number of sets of routes.

In many ways, this is the ‘holy grail’ of Machine Learning, the ability for a machine to step away from what we tell it, and to be able to solve the problem in a more direct way. Heuristics are an attempt to solve this problem, but they’re always somewhat rules-based.

Next is clustering, best used for differentiating between different groups of things so that you can make a decision. My favourite is ‘Flow Cytometry’ https://en.wikipedia.org/wiki/Flow_cytometry, where you’re trying to differentiate different groups of cells, basically through clustering on a 2-D graph of the brightness of various fluorescent cell markers.

Customer persona clustering is another example, such as you might do for segmentation, where standard groups like age or location would not be good enough.

Machine Learning problems such as the Netflix challenge http://www.netflixprize.com/, where you want a large degree of accuracy in your answer, require the use of a number of techniques. (The problem was to take a list of customer movie ratings and predict how those customers would rate other movies.)

First, you need to clean and normalize the data. The authors were also able to separate the general opinion of each movie from the specific opinion each person had about each movie. (Each of these was about as important to the overall result.) Each of these normalizations or bias removals would likely have been done with some form of machine learning, suggesting that any comprehensive usage would require multiple pipelines or channels, probably directed by some master channels* learning from which of them were the most effective.

I wonder how much of what we do as humans involves breaking down the problem, to divide and conquer. When we’re asked for a movie recommendation, do we think of good movies first, then what that person would think of? Personally, I feel I get my best results when I try to put myself in that person’s shoes, suggesting there may be a long way still to go.

Perhaps looking at groups of movies, or some sort of tagging, to get at whatever ‘genes’ may be underneath, as you may like certain things about movies which are only imperfectly captured by how people like them similarly. (Or perhaps, the data is big enough to capture all of this. It’s fun to speculate. ๐Ÿ˜€ )

*This suggests a hierarchy, which is only one way of seeing the structure. Other views are possible, but outside the scope.

Automation and Machine Learning

When we ask a computer for help with a task, what are we asking for?

1) Help with automating a repetitive task
2) Help with a decision

1) Help with automating a repetitive task
There are various ways you can automate a repetitive task. You can:
a) Ask your computer to do the same thing again and again, regardless of input (display the home page)
b) Give it some simple rules to follow (if they try to navigate to a non-existent page, show them a 404)
c) Give it some complex or not fully understood rules to follow (based on our tests, these are the solutions you should attempt, in this order)
d) Give it inputs, and have it adapt (‘Watch me perform this industrial assembly task, now you do it’)

2) Help with a decision
There are various different ways you can use a computer to help with a decision. You can:
a) Display data in various interesting ways (Data Visualization)
b) Give it the data and some rules to follow (Standard decision automation)
c) Give it the data and a desired output/scoring function (Supervised/Reinforcement Learning)
d) Give it the data and nothing else* (Unsupervised Learning)

This is somewhat of a false dichotomy, as adding new types of decisions allows more and more automation.

– Search (inputting words, pictures, video into a search engine and asking for a result) generally started with 2.a) (Data Display), and seems to be trying to move up the decision hierarchy, anticipating questions and the rules the user would want it to follow. This seems to be generally done with statistics, but I expect this would be switching over to pattern-finding neural nets
– Clustering (throwing a bunch of data into the hopper and getting groupings back) is also mostly in the Data Visualization bucket. It could also be an input into a machine learning algorithm, which would then be trained to make decisions based on these clusters
– Machine Learning (giving a bunch of data and getting a decision or pattern out) can be used for most or all of the options above, and similar to how computers have gotten ‘fast enough’, Machine Learning is becoming ‘good enough’ or ‘easy enough’ to replace many of the above.

So, as a human, when do you choose each of these? Assuming the options get more difficult going down the list, you would:

1) Start by googling various things (mostly to see what has been done before***).
2) You could then look at the data, clean it, and try clustering it into groups, to see if any of them made sense for the decision you wanted to make.
3) If neither of these worked, or if you wanted more, you could derive a scoring function for the output you wanted, then supply a Machine Learning algorithm with a substantial amount of data, and see how optimal it could make the decision.
4) If you don’t even know what decision you want, or are having difficulty making a scoring function, you could throw the data into an unsupervised learning hopper and see what comes out.

At each of the steps above, you can hive off parts and automate them, either using rules derived from the patterns you’ve found, or using flexible rules from the Machine Learning algorithm. You may find you can accomplish most of your task without having to resort to complex or incompletely understood algorithms.

More examples in subsequent posts. Stay tuned. As you can tell, the categories above have not fully crystallized.

*Unsupervised Learning has a number of levels** in it, such as ‘Find Features’, ‘What is the Question?’, ‘Why?’, etc…

**Not that everything is hierarchical, but this is convenient for discussion

***This is the ‘literature review’ portion of anything we do now