SoSSFM

Abstract

Several tasks in computer vision and machine learning can be modeled as MRF-MAP inference problems. Using higher order potentials to model complex dependencies can significantly improve the performance. The problem can often be modeled as minimizing a sum of submodular (SoS) functions. Since, sum of submodular functions is also submodular, existing submodular function minimization (SFM) techniques can be employed for optimal inference in polynomial time [1], [2]. These techniques, though oblivious to the clique sizes, have limited scalability in the number of pixels. On the other hand, state of the art algorithms in computer vision [3], [4] can handle probelms with a large number of pixels but fail to scale to large clique sizes. In this paper, we adapt two SFM algorithms [1], [5], to exploit the sum of submodular structure, thereby helping them scale to large number of pixels while maintaining scalability with large clique sizes. Our ideas are general enough and can be extended to adapt other existing SFM algorithms as well. Our experiments on computer vision problems demonstrate that our approach can easilty scale up to clique sizes of 300, thereby unlocking the usage of really large sized cliques for MRF-MAP inference problems.

Results

Original Text

TextonBoost

Pairwise

SoS-MinNorm
(Small Cliques)

SoS-MinNorm
(Large Cliques)






Results for pixel level object detection using our SoS Min Norm algorithm. We generate per pixel confidence using the probabilities generated by TextonBoost. The second column titled TextonBoost has been generated based upon these confidence alone. The third column, shows the results using cliques of size of 2 only. For generating higher order cliques to be used with our algorithm, we use region growing as suggested by Stobbe and Krause [7]. The fourth column shows the results using our algorithm with smaller cliques. In this case the region growing was restricted to 50 resulting in average clique size of 31. For testing with larger cliques, we allowed region growing until size 300 was reached. This generates cliques with average size of 230. We used count based cost for our algorithm. The quality of results seems to improve with increasing clique size.



Image

Interaction

Result1

Result2

Result3






Results of our experiments on interactive object segmentation problem. The unary cost for the inference is based upon user interaction, whereas the higher order cliques have been generated using the region growing as is done for object detection problem. First and second columns show the input image and user inputs respectively, whereas columns third to fifth show the results using the SoS Min Norm algorithm with increasing average clique size. The image size, average clique sizes and the times for first and second row are 103 * 132, (181,500,1030) and (0.11,0.47,1.15) seconds and 223 * 142, (66,332,814) and (0.16,0.38,1.13) seconds respectively. Similar to object detection problem, our experiments show improved visual quality with increasing clique size.



ε=1

ε=0.5

ε=0.1

ε=0.05

ε=0.01

(0.03, 922,-757)

(0.09, 922,-757)

(0.15, -95,-190)

(0.39,-151,-179)

(3.66,-177,-177)

(0.4, 8862,-2327)

(0.9, 799,-956)

(2.4, -204,-405)

(7.2, -287,-336)

(73.6, -309,-310)

(0.39, 8444,-2525)

(1.04, 7300,-619)

(1.75, -41,-124)

(5.51, -982,-110)

(72.5, -106,-106)


By changing the value of Epsilon in Step 5 Procedure 5, our algorithm can potentially be used as an approximation algorithm with faster inference time. The number below the each figure shows time taken (in seconds), followed by primal and dual values (in thousands). The average clique size for the problem is approximately 250. We have used count based clique potential. Note that, as the epsilon is decreased inferred primal value decreases, dual value increases showing progress towards optimality. Interestingly, there is a corresponding improvement in visual quality as well. The approximation strategy should be useful for applications with limited time budget.




Clique Size

            2*2 4*2 4*4 4*6

SoS-Jegelka 861.07 12217.37 TO TO
GC 0.00 0.26 731.67 DNR
LGC .03 .046 154.73 TO
SoS-Schrijver 0.27 7.33 153.21 1273.14
SoS-MinNorm 0.01 0.01 0.02 0.02

Comparing adapted SoS algorithms with LGC [4], GC [3] and SoS-Jegelka [6] for varying problem sizes. Clique size is fixed at 16. We use edge based potentials for the experiment. Our approach performs better than the other approaches at all the problem sizes. SoS-Jegelka had a timeout (time more than 4 hours) for all values.



DOWNLOAD

Min Norm Point Algorithm for Higher Order MRF-MAP Inference

Abstract

Many tasks in computer vision and machine learning can be modelled as the inference problems in an MRF-MAP formulation and can be reduced to minimizing a submodular function. Using higher order clique potentials to model complex dependencies between pixels improves the performance but the current state of the art inference algorithms fail to scale for larger clique sizes. We adapt a well known Min Norm Point algorithm from mathematical optimization literature to exploit the sum of submodular structure found in the MRF-MAP formulation. Unlike some contemporary methods, we do not make any assumptions (other than submodularity) on the type of the clique potentials. The proposed algorithm can scale to clique sizes of many hundreds and beyond. The current best reported, for a general class of submodular functions, is 16 with runtime of many hours. On the other hand, our algorithm is highly efficient and can perform optimal inference in few seconds even on clique size, an order of magnitude larger. We demonstrate the efficacy of our approach by experimenting on synthetic as well as real datasets.

Results

Original Text

TextonBoost

Pairwise

SoS-MinNorm
(Small Cliques)

SoS-MinNorm
(Large Cliques)






Pixel level object detection: We generate per pixel confidence using the probabilities generated by TextonBoost. The second column titled TextonBoost has been generated based upon these confidence alone. The third column, shows the results using cliques of size of 2 only. For generating higher order cliques to be used with our algorithm, we use region growing. The fourth column shows the results using our algorithm with region growing restricted to 50 resulting in average clique size of 31. For testing with larger cliques, we allowed region growing until size 300 generating cliques with average size of 230. We used count based cost. The quality of results seems to improve with increasing clique size.



Image

Interaction

Result1

Result2

Result3






Interactive object segmentation: The unary cost for the inference is based upon user interaction, whereas the higher order cliques have been generated using the region growing as is done for the object detection problem. First and second columns show the input image and user inputs respectively, whereas columns third to fifth show the results using our approach with increasing average clique size. Similar to the object detection problem, our experiments show improved visual quality with increasing clique size.




Clique Size

                  2*2 4*2 4*4 4*6

SoS-MinNorm 0.01 0.01 0.02 0.02
Genric Cuts 0.00 0.26 731.67 DNR
BCD .03 .046 154.73 TO

Comparing SoS-MinNorm with Generic cuts [3] and Block coordinate descent [6] on varying cliques using edge based potential. The problem size was fixed at 400. The numbers denote the time taken in seconds. DNR shows that the algorithm crashed on the test. TO denotes a time out for decomposition approach after 4 hours of running. SoS-MinNorm significantly outperforms both the existing algorithms, none of which can scale beyond clique size 16.



DOWNLOAD

Reference

  1. A. Schrijver, “A combinatorial algorithm minimizing submodular functions in strongly polynomial time,” Journal of Combinatorial Theory, Series B, vol. 80, no. 2, pp. 346–355, 2000.
  2. J. B. Orlin, “A faster strongly polynomial time algorithm for submodular function minimization,” Mathematical Programming, vol. 118, no. 2, pp. 237–251, 2009.
  3. C. Arora, S. Banerjee, P. Kalra, and S. Maheshwari, “Generalized flows for optimal inference in higher order MRF-MAP,” IEEE TPAMI, vol. 37, no. 7, pp. 1323–1335, 2015
  4. D. Khandelwal, K. Bhatia, C. Arora, and P. Singla, “Lazy generic cuts”, Computer Vision and Image Understanding, vol. 143, pp. 80 – 91, 2016.
  5. S. Fujishige and S. Isotani, “A submodular function minimization algorithm based on the minimum-norm base”, Pacific Journal of Optimization, vol. 7, pp. 3–17, 2011.
  6. S. Jegelka, F. Bach, and S. Sra, “Reflection methods for user-friendly submodular optimization,” in Proc. of NIPS, 2013, pp. 1313–1321.
  7. P. Stobbe and A. Krause, “Efficient minimization of decomposable submodular functions,” in Proc. of NIPS, 2010, pp. 2208–2216.