Computer Vision Research: The deep "depression"

Computer Vision Research: The deep "depression"


Well, I am not that old, but I have been involved with computer vision for almost two decades now. I have started publishing papers when about 250 papers were submitted per year to the major and most selective conferences in computer vision (ICCV, CVPR, ECCV). At that time the conference boards were approx 60-80  people and there were 300-400 participants.

Computer vision conferences (even up to 2010) were organized in a number of thematic areas reasonably well represented both in terms of content as well as in terms of approaches. Early vision, grouping/segmentation, motion analysis/tracking, recognition & 3D vision are some examples. Statistics, geometry, optimization were there in almost all of these areas, and one could get a grasp/global view of the field through his participation to such a conference. Coming to the vision field required a reasonable understanding of physics, math, statistics and geometry. Participating to the conference was giving you an exposure to computer vision challenges as well as to approaches.

There were always trends and dominant topics in the field. I guess eighties were all about stereo, nineties were all about continuous methods and segmentation grouping, while the change of the century brought in discrete methods and the refocus of the community to recognition and descriptors. In parallel, machine learning community has stepped in and its recent developments made it to the computer vision field. Having said the above, despite the presence of dominant topics still the field was quite diverse and still alternative ideas could sneak in in almost all sub-domains of computer vision.

Well, I have the impression that this is far from being the case anymore. Research now focusing on using deep learning complex engineering pipelines to address computer vision tasks. 80-90% of the papers that are published in conferences and almost all oral papers do come from this area. There is absolutely nothing wrong on having such papers, and their performance justify definitely their value, however one can question what is the "added" scientific value. Other than a handful number of people doing some fundamental research towards understanding the theoretical concepts of these methods, almost all the community now seems to target the development of more complex pipelines (that most likely cannot be  reproduced based on the elements presented in the paper) which in most of the cases have almost no theoretical reasoning behind that can add 0,1% of performance on a given benchmark. Is this the objective of academic research? Putting in place highly complex engineering models that simply explore computing power and massive annotated data? The community (and I guess all communities) was running after benchmarks and low hanging fruits also in the past but at that time there was an alternative for other directions as well which doesn't seem to be the case anymore. This is not the case only for conferences but also for funding as well which has as direct consequence the rapid decrease of the research "theoretical depth" in the field or I could state instead research diversity.

It might be simply because deep learning on highly complex, hugely determined in terms of degrees of freedom graphs once endowed with massive amount of annotated data and unthinkable - until very recently - computing power can solve all computer vision problems. If this is the case, well it is simply a matter of time that industry (which seems to be already the case) takes over, research in computer vision becomes a marginal academic objective and the field follows the path of computer graphics (in terms of activity and volume of academic research).

If not though, one can question how computer vision will move to the next level? How from a community where all fresh incoming PhD students have never and most likely will never hear about statistical learning, pattern recognition, euclidean geometry, continuous and discrete optimization, etc. new ideas will emerge. I am a believer of "broad" and rich scientific culture, and I have the impression that this is in the process of disappearing from the field. One can envision two possible interpretations: a highly positive one (we do converge towards the famous David Marr's theory that assumes that a single computational framework can address visual perception). This will be a great accomplishment since a field that was at 5% accomplishment in 1995 (recall Pr. Thomas Huang presentation at ICPR'96 conference). There is a less positive interpretation though where we are putting all our efforts - while excluding alternatives -  on an area that shows great promises, but still will not be able address on  its own the rich variety of problems in computer vision.

A very good friend mentioned to me once that there are three deep learning stages: denial, doubt, and acceptance/adoption! I guess I navigate on the ocean between the last two stages without a compass.

follow me on twitter: @agonigrammi

During the Ph.D. proposal, the student said he's gonna do machine learning. Then I said, "So.. the machine will learn, ye? But.. then.. what are YOU gonna learn?"

Neil Robertson

Computer Vision Technologist, Serial Tech Founder

6y

Nikos, we have a paper accepted to ICCV that shows - in the really challenging cases - that combining attributes (non-DL) with DL features actually outperforms direct convnet computation for the general face recognition problem. This isn't a "push back" on our part, it's what works. To that end we have a special session at Face and Gesture 2018 which calls for explicit exploration of where DL and non-DL features are successful in face recognition.

if we have the same good friend; it's denial, doubt and obvious; but I'm still between the first two stages ;)

Like
Reply

I've been looking at deep learning and cv for the first time this weekend, and that is exactly the sense I got from reading a few papers and going through many web sites devoted to it. I thought maybe it's my own "culture shock" exploring a different field for the first time, but perhaps not.

Like
Reply
Phil Teare

Head of Machine Perception | Centre for AI | Data Science & AI | BP R&D at AstraZeneca

6y

I suspect the analogy will trend towards biology rather than physics. Still science. Still a lot of empirical work needed. But less focus on the miniscule/abstract and more on the macro systems and practical consideration of very complex systems. But the small details (just as in biology) will still matter a great deal. Do we belittle medical science for running large trials on effectiveness of drugs, rather than focussing purely on the theory and design? No. Industry and large scale technology will play an ever increasing role, but we still see excellent fundamental work (e.g. selu, just last week).

To view or add a comment, sign in

Insights from the community

Explore topics