How do you make a a playlist of songs which are similar, but not identical. Ideally, you want to play music that the user is likely to want to listen to*, but you probably don’t want to play the same song, even in different remixes over and over. So, how do you detect similarities, while removing identicals, even when they may not be so identical?
In practice, there is probably a lot of separation between the spike of identical songs and those that are merely similar. You could also use the Web 2.0 crutch of looking at what people searched after other songs, and/or the machine learning approach of trying to put songs after one another and seeing what people skipped or turned to from the suggestions instead.
Similarly**, cleaning data of artifacts is still an open problem. It feels to me like a similar one. You’re trying to remove this *huge* signal which is overwhelming your sensors so you can get at what you actually care about. Assuming both the artifacts and the signal are within your detection limit***, you have to determine the nature of the artifact, both where it is in the signal spectrum, and what axes it spreads through and how. It might also have related harmonics****.
Another related problem is the removal of 60Hz***** noise from all sorts of electronics. I’m not sure what sorts of filters are used, but even band reject filters have non-ideal behaviour, so perhaps smoothing the edges in a known way works better, but this is all speculation. I mostly like using the field around power cords to test oscilloscopes and to get people to think about electric fields.
But back to artifact removal. I don’t have particular insights right now, outside of specific problems spaces. I just think it would be a really cool problem to work on (and one that people work on in a specific way all the time).
*Or perhaps something just similar enough that you’ve been paid enough to play.
**But not identically,
***My favourite procedure/process is the one I learned from an analytical chemist, which is that the signal has to be 3x the noise for you to consider it signal.
****I’m using signal processing as an analogy, but the concept is the same for other artifact removal, just different math.
*****50Hz across the pond