2019-05-24

Voice to text is completely incapable of handling Roman names, and doesn't have a firm grasp of the English language either. I got as far as "Vipsanius was cold and haughty" turning into "Website ass was cold and hot tea" before I gave up on it.

I have come up with a ridiculous OCR workaround.

OCR doesn't like flipping through pages in large PDFs - it has a tendency to freeze, and has no "go to page 70" button. I have no way of editing a PDF down, and OCR doesn't like screenshots taken on a tablet - too low resolution? So I have had to re-download every PDF onto my computer, open them up, flip the view sideways to fit an entire page into the screen at max resolution, take a screenshot, put the screenshot in Paint to crop it and flip it back, and then feed the JPEG to OCR. For some unfathomable reason that works, and I'm most of the way through the files.

I then use the screenshots to proofread the OCR file. OCR has never gotten a quotation mark right in its life, and has invented single letter that looks exactly like the letters fi but is stuck together into one letter, which terrifies my spellcheck. Here, look at this - ﬁ is one letter, not two letters, see? They look identical, but run your cursor over em, you will see. It will also turn an m into an rn and vice versa - my particular favorite example is "whom he bribed to bum down the villa of the lord."

Part of what I'm doing is using copyediting to take the edge off my urge to find more of the stories - I have already made my requests, and must wait patiently for the results. There's a certain charm to reading a mystery for the first time, and either being surprised by the ending, or feeling clever about guessing it. But subsequent reads are enjoyable in their own way - the warmth of familiarity, the nuances you pick up, the connections you make between stories. I have already started considering some stories noncanonical, of course.

2019-05-24

Tricking The Program Into Working