Computer Vision Research: The deep "depression"

Nikos Paragios

Distinguished Professor of Mathematics at Université Paris-Saclay/CEO at TheraPanacea

Published Jun 5, 2016

Well, I am not that old, but I have been involved with computer vision for almost two decades now. I have started publishing papers when about 250 papers were submitted per year to the major and most selective conferences in computer vision (ICCV, CVPR, ECCV). At that time the conference boards were approx 60-80 people and there were 300-400 participants.

Computer vision conferences (even up to 2010) were organized in a number of thematic areas reasonably well represented both in terms of content as well as in terms of approaches. Early vision, grouping/segmentation, motion analysis/tracking, recognition & 3D vision are some examples. Statistics, geometry, optimization were there in almost all of these areas, and one could get a grasp/global view of the field through his participation to such a conference. Coming to the vision field required a reasonable understanding of physics, math, statistics and geometry. Participating to the conference was giving you an exposure to computer vision challenges as well as to approaches.

There were always trends and dominant topics in the field. I guess eighties were all about stereo, nineties were all about continuous methods and segmentation grouping, while the change of the century brought in discrete methods and the refocus of the community to recognition and descriptors. In parallel, machine learning community has stepped in and its recent developments made it to the computer vision field. Having said the above, despite the presence of dominant topics still the field was quite diverse and still alternative ideas could sneak in in almost all sub-domains of computer vision.

Well, I have the impression that this is far from being the case anymore. Research now focusing on using deep learning complex engineering pipelines to address computer vision tasks. 80-90% of the papers that are published in conferences and almost all oral papers do come from this area. There is absolutely nothing wrong on having such papers, and their performance justify definitely their value, however one can question what is the "added" scientific value. Other than a handful number of people doing some fundamental research towards understanding the theoretical concepts of these methods, almost all the community now seems to target the development of more complex pipelines (that most likely cannot be reproduced based on the elements presented in the paper) which in most of the cases have almost no theoretical reasoning behind that can add 0,1% of performance on a given benchmark. Is this the objective of academic research? Putting in place highly complex engineering models that simply explore computing power and massive annotated data? The community (and I guess all communities) was running after benchmarks and low hanging fruits also in the past but at that time there was an alternative for other directions as well which doesn't seem to be the case anymore. This is not the case only for conferences but also for funding as well which has as direct consequence the rapid decrease of the research "theoretical depth" in the field or I could state instead research diversity.

It might be simply because deep learning on highly complex, hugely determined in terms of degrees of freedom graphs once endowed with massive amount of annotated data and unthinkable - until very recently - computing power can solve all computer vision problems. If this is the case, well it is simply a matter of time that industry (which seems to be already the case) takes over, research in computer vision becomes a marginal academic objective and the field follows the path of computer graphics (in terms of activity and volume of academic research).

If not though, one can question how computer vision will move to the next level? How from a community where all fresh incoming PhD students have never and most likely will never hear about statistical learning, pattern recognition, euclidean geometry, continuous and discrete optimization, etc. new ideas will emerge. I am a believer of "broad" and rich scientific culture, and I have the impression that this is in the process of disappearing from the field. One can envision two possible interpretations: a highly positive one (we do converge towards the famous David Marr's theory that assumes that a single computational framework can address visual perception). This will be a great accomplishment since a field that was at 5% accomplishment in 1995 (recall Pr. Thomas Huang presentation at ICPR'96 conference). There is a less positive interpretation though where we are putting all our efforts - while excluding alternatives - on an area that shows great promises, but still will not be able address on its own the rich variety of problems in computer vision.

A very good friend mentioned to me once that there are three deep learning stages: denial, doubt, and acceptance/adoption! I guess I navigate on the ocean between the last two stages without a compass.

follow me on twitter: @agonigrammi

Omer Nezih Gerek

4mo

During the Ph.D. proposal, the student said he's gonna do machine learning. Then I said, "So.. the machine will learn, ye? But.. then.. what are YOU gonna learn?"

1 Reaction

Neil Robertson

Computer Vision Technologist, Serial Tech Founder

Nikos, we have a paper accepted to ICCV that shows - in the really challenging cases - that combining attributes (non-DL) with DL features actually outperforms direct convnet computation for the general face recognition problem. This isn't a "push back" on our part, it's what works. To that end we have a special session at Face and Gesture 2018 which calls for explicit exploration of where DL and non-DL features are successful in face recognition.

5 Reactions

Yvan Richard

Imaging Science

if we have the same good friend; it's denial, doubt and obvious; but I'm still between the first two stages ;)

Terry McDermott

Software Developer

I've been looking at deep learning and cv for the first time this weekend, and that is exactly the sense I got from reading a few papers and going through many web sites devoted to it. I thought maybe it's my own "culture shock" exploring a different field for the first time, but perhaps not.

Phil Teare

Head of Machine Perception | Centre for AI | Data Science & AI | BP R&D at AstraZeneca

I suspect the analogy will trend towards biology rather than physics. Still science. Still a lot of empirical work needed. But less focus on the miniscule/abstract and more on the macro systems and practical consideration of very complex systems. But the small details (just as in biology) will still matter a great deal. Do we belittle medical science for running large trials on effectiveness of drugs, rather than focussing purely on the theory and design? No. Industry and large scale technology will play an ever increasing role, but we still see excellent fundamental work (e.g. selu, just last week).

Computer Vision Research: The deep "depression"

Nikos Paragios

Distinguished Professor of Mathematics at Université Paris-Saclay/CEO at TheraPanacea

More articles by this author

Insights from the community

Explore topics

Tribute to Hervé Biausser, former CentraleSupelec's dean, retiring after fifteen years of service

Sep 22, 2018

Some thoughts on the state & future of AI

Jun 5, 2017

PhD Thesis Position / Metric & Transfer Learning for Personalized Cancer Treatment

May 8, 2017

Professor Position Opening [Computer Vision] / CentraleSupelec / University of Paris-Saclay / Inria

Feb 13, 2017

State of the Computer Vision & Image Understanding Journal (CVIU)

Jan 11, 2017

Five Indispensable Qualities of your Colleagues

May 12, 2016

Five Good Reasons to Change Job

Sep 20, 2015

Excellence = Systemic support, intellectual capacity & top management

Jul 11, 2015

Handling Incompetence in Collaborative Projects

Mar 28, 2015

The Myth of European Research Integration

Feb 2, 2015

Insights from the community

Explore topics