Deep residual coalesced convolutional network for efficient semantic road segmentation

Table 2 Comparison on the CamVid dataset [16] using 11 road scene categories (in percent)

Method	Sky	Building	Road	Sidewalk	Car	Pedestrian	Bicyclist	Tree	Fence	Column-pole	Sign-symbol	Class avg.	Class IoU
Local label descriptor [1]	88.8	80.7	98	12.4	16.4	1.09	0.07	61.5	0.05	4.13	n/a	36.3	n/a
Boosting+pairwise CRF [2]	94.7	70.7	94.1	79.3	74.4	45.7	23.1	70.8	37.2	13	55.9	59.9	n/a
Boosting+detection+CRF [3]	96.2	81.5	93.9	81.5	78.7	43	33.9	76.6	47.6	14.3	40.2	62.5	n/a
Dense depth map [4]	95.4	85.3	98.5	38.1	69.2	23.8	28.7	57.3	44.3	22	46.5	55.4	n/a
Super parsing [5]	96.9	87	95.9	70	62.7	14.7	19.4	67.1	17.9	1.7	30.1	51.2	n/a
SegNet-basic [8]	91.2	75	93.3	74.1	82.7	55	16	84.6	47.5	44.8	36.9	62	47.7
SegNet [8]	92.4	88.8	97.2	84.4	82.1	57.1	30.7	87.3	49.3	27.5	20.5	65.2	55.6
ENet [9]	95.1	74.7	95.1	86.7	82.4	67.2	34.1	77.8	51.7	35.4	51	68.3	51.3
RCC-Net (sum)	95.2	70.1	94.1	90.1	82.6	70.6	45.7	81.2	51	52.3	35.4	69.8	52.6
RCC-Net (concatenated)	94.3	71.8	92.6	92.7	79.3	57.7	65.6	80.5	35.7	57.4	59.4	71.5	53.3