Import Hatena Diaries to Scrapbox with past dates
from Machine writes in Scrapbox
Objective.
- When I search for Scrapbox, I don't get any hits on past blog posts, I think I should get a hit.
- It's automatically converted to Hatena blog, but I'm not paying for it, so I get ads.
- I feel bad when I link them.
- I want to bracket past articles as well.
2021-05-04 Hatena Diary imported into Scrapbox with past date script
2021-12-20
In the past (2018) I had even created a "script to convert exported XML to JSON in Scrapbox format".
- Importing Hatena Diary into Scrapbox
- Why haven't you imported it into this project?
- Formatting is not fully supported.
- I didn't want the top of the page to be filled with mechanical articles.
- → I noticed that if you put the update date in the past, the top page won't be filled.
(computer) format
- It looks like the entity reference
> is being erased when I read it in bs4.
- I don't think that's possible, but something lost came up and I'm not sure how to solve it.
- It's not a big deal, so I parsed it on my own.
- No conversion from Hatena notation to Scrapbox notation
- I put it all together and put it in Scrapbox code notation.
- If it hits the search and you can read it, that's all that matters.
- Do `html.unescape
- I was going to use the title of the blog as the page title, but decided against it because I felt it wasn't necessary.
- All titles will be machine generated.
- Easy mechanical removal if you decide it's not good enough after import
- Q: Wouldn't it be better to make a backup before importing?
- A: If I want to undo the import after running the import for a while and editing various pages, I can't restore from a backup.
- Safe and secure override operation
- It's sad when a script is updated and reconverted and then overwritten and the human-written text is lost.
- This anxiety prevents us from "keeping it updated".
- Don't change the machine-generated page, just create a page with a good title that can be duplicated when you want to change it.

- The creation date and time are correctly set to three years ago.
- But, well, it might have been enough to just say "not on the top page" without having to match the exact date and time of creation.
python
def timestamp(*args):
return datetime.datetime(*map(int, args)).timestamp()
- 2021-06-29 PS: I still prefer the current "date and time the article was actually written" because I'm comfortable with the search results being in chronological order.
search (e.g. for someone using a search engine)
I made a heading page but deleted it because I knew I would never use it.
scale
- 120,000 lines of XML
- 220,000 lines of JSON
- 1500 pages
superscription
- Schedule: Can be overwritten if the change date/time has not changed.
- Schedule: Check for conflicts on import
- Actual: Always overwritable
Try it on an empty project
Will bots be able to script their participation in the project as well?
:
Request URL: "https://scrapbox.io/api/projects/{project}/invitations/{key}"
POST
This page is auto-translated from /nishio/はてなダイアリーを過去の日付でScrapboxにインポート using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.