I've been viewing the role of PDF.js wrongly. I had been assuming that it was accessing the natively rendered PDF in Chrome. It isn't - it's accessing the file itself, converting it from binary into JS in a worker, and accessing that, bypassing the rendered version in Chrome. This means that accessing selected text in the natively rendered Chrome environment is not possible through PDF.js - or, in fact, at all. Chrome uses PDFium, a project based on FoxitPDF, a third party system with an SDK that doesn't have an API, largely for security reasons.
This means that in order to get selected text, we have to render the PDF ourselves - load it into local storage, halt PDF rendering by Chrome, and re-render, and then use the PDF.js API to access it. This is significantly slower, but the only feasible way of accessing the PDF the way we want to.
Most platforms host the PDF on their own servers instead of locally rendering it (peerlibrary, for instance). I'm still not sure how Hypothes.is does it, but that will be worth checking out. For now, loading the PDF locally will give us the added benefit of being able to easily integrate the viewer components that comes with the PDF.js examples. Converting these into React may be a good move in itself, if it hasn't already been done. Once we're using a instance of pdfViewer, we can access and control highlighting easily.
Current goals:
I've been viewing the role of PDF.js wrongly. I had been assuming that it was accessing the natively rendered PDF in Chrome. It isn't - it's accessing the file itself, converting it from binary into JS in a worker, and accessing that, bypassing the rendered version in Chrome. This means that accessing selected text in the natively rendered Chrome environment is not possible through PDF.js - or, in fact, at all. Chrome uses PDFium, a project based on FoxitPDF, a third party system with an SDK that doesn't have an API, largely for security reasons.
This means that in order to get selected text, we have to render the PDF ourselves - load it into local storage, halt PDF rendering by Chrome, and re-render, and then use the PDF.js API to access it. This is significantly slower, but the only feasible way of accessing the PDF the way we want to.
Most platforms host the PDF on their own servers instead of locally rendering it (peerlibrary, for instance). I'm still not sure how Hypothes.is does it, but that will be worth checking out. For now, loading the PDF locally will give us the added benefit of being able to easily integrate the viewer components that comes with the PDF.js examples. Converting these into React may be a good move in itself, if it hasn't already been done. Once we're using a instance of pdfViewer, we can access and control highlighting easily.
Current goals: