Putting Book Scans PDFs in Scrapbox 2019

2019-10-08

Place Book Scanning PDF in [Scrapbox
https://www.facebook.com/toshiyukimasui/posts/10157675595687498
- Gyazo There is a Gem
- masui/Book2Scrapbox: a device for reading self-prepared books in Scrapbox
Upload to Gyazo Pro via script after disassembling into images
Gyazo Pro uses Google Cloud Platform's CLOUD VISION API for OCR.
It takes time, so we get OCR data after a while.

Readings from https://github.com/masui/Book2Scrapbox

Scanning results from ScanSnap are retrieved in pdfimages.
- Related PDF to PNG conversion.
- If it's a cut-and-scan PDF, that's OK.
- PDFs of slides, etc. are not acceptable.
Locally, folders are cut and stored with MD5 hash.
Sync it to AWS.
- AWS Command Line Interface (CLI: an integrated tool to manage AWS services)| AWS must be installed
- Installing the AWS CLI - AWS Command Line Interface
  - That's very kindly written.
- AWS CLI Configuration - AWS Command Line Interface
- aws s3 sync
  - Deletion on hand does not delete anything on S3.
Sync to AWS is not really required.
- Because I'm sending the contents of the FILE to gyazo.
https://github.com/nishio/Book2Scrapbox
- Use pdftocairo since slides cannot be converted to images with pdfimeges
  - $ pdftocairo -r 200 -f 0 -jpeg <pdf> pages
    - see PDF to PNG conversion
- Multiple PDFs are now combined into a single JSON
- pdfstojson.rb calls makejson.rb
  - I looked into how to do it in Python, but I was able to achieve it by using makejson.rb as a child process.
- Download and add the OCR results from Gyazo a while after the JSON is ready.

This page is auto-translated from /nishio/書籍スキャンPDFをScrapboxに置く2019 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.

(C)NISHIO Hirokazu / Converted from Markdown (en)
Source: [GitHub] / [Scrapbox]