While choosing a kernel bandwidth may be arbitrary for certain situations, it does have some nice properties. The bandwidth informally defines how "close" points need to be to be considered similar, which can work well for certain problem domains where this can be easily determined.
Like most clustering problems, if you can't choose a reasonable set of parameter values based on some domain specific information, it is largely trial and error. Of course, there are several metrics that can be used to score certain clusterings (e.g., parameter values) over others, as described in your wikipedia link.
The other nice advantage that mean shift has over k-means is that it does not make any assumptions about the cluster shape. K-means assumes spherical clusters. Mean shift allows for clusters of any shape, since it is driven by density.
Like most clustering problems, if you can't choose a reasonable set of parameter values based on some domain specific information, it is largely trial and error. Of course, there are several metrics that can be used to score certain clusterings (e.g., parameter values) over others, as described in your wikipedia link.
The other nice advantage that mean shift has over k-means is that it does not make any assumptions about the cluster shape. K-means assumes spherical clusters. Mean shift allows for clusters of any shape, since it is driven by density.