Hi,

I'm seeking advice related to Inkscape Bug #1069248:
 Feature Request: Add 'Export to hOCR' in Save Dialog
 https://bugs.launchpad.net/inkscape/+bug/1069248
 
I've published a draft implementation of the extension (see comments) and am currently working on the next step, which is to say the ability to create multi-page PDFs from directories of hOCR html files and corresponding images.  I have a standalone, draft version of this script based loosely on HocrConverter
( github.com/jbrinley/HocrConverter ); it's functionality I'd like to develop into a second Inkscape extension, if possible.

So, two questions: 1) what steps are necessary to include the 'Export as hOCR' extension in Inkscape upstream?  and 2) the 'next step' extension ("Create Multi-Page PDF from hOCR HTML Files" or similar): what would be the best way to integrate this into Inkscape?  Most (all?) Inkscape exports/effects/operations operate on single documents, so it's a little different in scope.

==

An alternative method to create text-searchable PDFs within Inkscape is to open an image, create text boxes with opacity >= 0.4% (tested with Evince 3.4.0), then export as PDF using the current extension of the same name.  Downsides to this approach are that the text is slightly visible and the loss of structured metadata (i.e., hOCR) which is extensible to other applications (e.g., DjVu, moz-hocr-editor)


Sincerely,
George