A Kernel function k(x, y) is a function that calculates phi(x)^T phi(y) for some phi. That means, it calculcates the dot product in a higher dimensional space of two data points; it does not do the transformation.
That implies that Kernels do not have to work on the gram matrix. Kernels can be sth completely different, e.g. Fisher Kernels.
(What I wrote is based on Bishop's book and others.)