Great overview. Clarifai is certainly extremely impressive.
Do you play the tablas? My wife and I studied sitar but our instrument was destroyed by the movers in our latest relocation to Shenzhen :( The tabla teacher where we studied was able to play a very complex taal while chewing betel nut and rolling his eyes back in their sockets, immediately switch to a pitch, bend and time-perfect rendition of 'pink panther' melody, then switch back to a very complex taal without skipping a beat. Brilliant to see.
Yeah, Clarifai did well. I am keen to learn how well their custom model feature works. Per their FAQ[0], you only need supply 20-50 images per concept. That seems remarkable to me, given that a concept like 'cow' has ~1500 images on Imagenet[1]. Perhaps they are using some sort of transfer learning to facilitate this? I.e. using a pretrained model, and then only retraining the last few fully connected layers, or retraining parts of the entire network?
I am not a deep learning practitioner, but would be curious to know from experts how their custom model feature might work; and from any of their users on how well it actually does.
Tablas: haha, great description of your teacher. I do play, with enthusiasm, but poorly. For those in Seattle, there is an amazing teacher who teaches up on Cap Hill [2].
It is not necessary to train things from scratch; you take the largest imagenet model available and fine tune it for the task. This way it reuses much of the lower layers the have seen lots of data.
Do you play the tablas? My wife and I studied sitar but our instrument was destroyed by the movers in our latest relocation to Shenzhen :( The tabla teacher where we studied was able to play a very complex taal while chewing betel nut and rolling his eyes back in their sockets, immediately switch to a pitch, bend and time-perfect rendition of 'pink panther' melody, then switch back to a very complex taal without skipping a beat. Brilliant to see.