I finally have something that mostly works. Attached are example input and result.

The algorithm I designed works as follows:

1. Compute a sum of Fourier transforms of each row and each column of the image.
2. For both sums, look for the 'base frequency', i.e. the size of the pixel grid. There are two variants here: if the pixel art does not have grid lines, the base frequency is indicated by periodic values very close to zero ("antipeaks"), and if it does, there are periodic peaks. Spacing between peaks / antipeaks corresponds to the number of pixels. I use an empirically derived scoring scheme.
3. Sample the image at the center of grid points. Currently I use bilinear sampling at the center of the pixel, but something more clever would be better to remove defacements. For instance, doing cluster analysis and picking the average of the most numerous cluster should work.
4. Quantize colors. I wanted to avoid having to specify the number of colors in advance, so I used single-link clustering in Lab color space. Complete-link would probably work better.
5. Output the resulting image as PNG.

The only tunable value in the algorithm is the clustering diameter; everything else is automatic. A value of deltaE = 8 seems to work fairly well.

The algorithm works very well if the following conditions are met:
1. Pixel size is at least 5-6.
2. Image borders are aligned with the pixel art grid.
3. There is no skew or other deformation of the image.

We can handle small pixel sizes, e.g. 2-5, by simply checking whether the art pixels given by these grid parameters contain approximately uniform colors.

Limitations 2 and 3 could be lifted by preprocessing the image before the FFT step. The Python script posted earlier in this thread can detect the edges of the pixel art, but it doesn't work well in some cases, for example in the case of sketches where the individual squares have jagged edges. However, I think the approach of doing a Hough transform to detect lines and then finding a projective transformation that straightens the image could be promising. Another idea is to find areas of approximately uniform color, mark them using something like flood fill, and then process the edges.

I will soon post the code publicly and then integrate this into Inkscape as an optional step in the Trace Pixel Art dialog, but can't give an ETA yet. I think late March is realistic for integration. I will need to factor out a few things which are more universally useful, e.g. conversion to Lab color space and color quantization, or use an external library such as Babl or LCMS.

Regards, Krzysztof