Karame is a camera app for iOS: Karakame. Sorry, Android guys, iOS only. The app takes 5 pictures with 3 seconds in between. After adjusting for small movements of the camera, it will then for each pixel in the five images, pick the median one. This has the effect that when pointed at a scene where people walk in and out, it will remove those people in the aggregate picture.
It works reasonably well. The app is by all means no replacement for the main camera app, more a proof of concept. It seems like the sort of thing main stream camera apps should add - if you have an app like that you can get the source for this at https://github.com/DOsinga/Karakame.
Karaoke famously means "Empty Orchestra" in Japanese - hauntingly beautiful. Except for that doesn't quite. Kara means empty (see also Karate - empty hand), but the "oke" bit is just taking from the English. So I called the app Karakame, from the almost Japenese for "empty camera".
Some notes on the implementation. The app uses OpenCV which you can quite easily integrate into iOS these days. I extracted the interoperability code into a OpenCVBitmap class, so have a look if you're interested in that sort of thing. The image stabilization works really well. I normalize to the middle bitmap (i.e. the third one if you take five pictures). Image stabilization leads to the fact that some of the border pixels will be missing from some of the pictures, but by picking the median pixel value most of the time we'll have values from other bitmaps.
I also experimented with object detection. OpenCV comes with a set of detectors called haar cascades that can detect faces, cars and people - no deep learning needed. It works well for face detection, but for cars and people I didn't get a lot of results. The idea was to leave pixels inside rectangles that were classified as cars or people out of the median voting, but I took that out again.
Finally the median pixel implementation. Calculating medians in higher dimensions is expensive so I decided to just calculate the medians for the red, green and blue channels. This could lead to weird results, but in my testing it did seem ok. I suppose I could do a little better by calculating the median for the three colors and in the case where there is a disagreement, pick whatever pixel has the smallest distance to the other candidates.