Putting Book Scans PDFs in Scrapbox 2019
2019-10-08
Readings from https://github.com/masui/Book2Scrapbox
- Scanning results from ScanSnap are retrieved in pdfimages.
- Related PDF to PNG conversion.
- If it's a cut-and-scan PDF, that's OK.
- PDFs of slides, etc. are not acceptable.
- Locally, folders are cut and stored with MD5 hash.
- Sync it to AWS.
- Sync to AWS is not really required.
- Because I'm sending the contents of the FILE to gyazo.
- https://github.com/nishio/Book2Scrapbox
- Use pdftocairo since slides cannot be converted to images with pdfimeges
$ pdftocairo -r 200 -f 0 -jpeg <pdf> pages
- Multiple PDFs are now combined into a single JSON
- pdfstojson.rb calls makejson.rb
- I looked into how to do it in Python, but I was able to achieve it by using makejson.rb as a child process.
- Download and add the OCR results from Gyazo a while after the JSON is ready.
Facebook
This page is auto-translated from /nishio/書籍スキャンPDFをScrapboxに置く2019 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.