NISHIO Hirokazu[English][日本語]

Visualization of Parliamentary Proceedings

2025-03-06 An experiment to see how much human labor can be outsourced to AI

Initial DeepResearch

nishio.iconLook into how to get data from Japanese parliamentary proceedings, can it be done in Python or something? I would like to know roughly what is possible to develop ideas. Further DeepResearch An API that can get JSON with GET is attractive, especially for keyword searches. Let's find out more about this API, gather information about what has been created using this API in the past, and describe "what is possible". Next o3-high-mini I would like a script to read the above log and output it as CSV formatted data to visualize the discussion on a particular topic at a later stage. First, I need to compile the information needed to create that, and ask questions if I'm missing any information. The way to select "specific topics" is by keyword search against the agenda and statements, and the latter stage is embedding using LLM, so the main focus is on the statement data. It would be nice to have attribute data (speaker and political party) in an additional field in the CSV that would allow for conversation flow and additional analysis. Further o3-high-mini Multiple keywords are retrieved by OR search, argparse, and specifying the time period is indeed important. Let's start with a short one as default, and try to make it gradually wider only when the number is small. The CSV remark ID is an integer value with the name comment-id and the remark text in comment-body. The rest of the names can be appropriate. argparse with filename/save location: default Encoding: UTF-8 without BOM Separator: comma delimited Error Handling: Errors are OK for immediate death Data cleansing: once without. I'll make something that works and then observe. Can you make it with this? If you can make it, make it.

https://chatgpt.com/share/67c9b1cb-3c14-8011-b4d8-7efe11d804af

Run the generated script as appropriate.

Searching for keyword 'earned income tax credit'... 001 JSON parsing failed That's no good!

Give this log to Devin for implementation.

  • They can't read the ChatGPT logs directly, I gave them a copy and paste.
  • First, I implemented it in the style of parsing XML, saying "The Parliamentary Meeting Minutes API returns data in XML format.
  • nishio.iconCan't you make it JSON?

  • Can't you do it in JSON? I asked, I checked the documentation and changed the parameters.
  • nishio.iconTry to see if you can set a time period for the keyword "income tax credit" to gather about 1,000 entries.

Devin.iconAfter confirming that the JSON format support was working properly, we attempted to retrieve data using the keyword "income deduction" and retrieved 846 records. Since the target of 1,000 records was not reached, we decided to try again with a further expanded time period.

It's done.

nishio.icon No need to support XML format, simplify scripts

devin.icon Test Results I ran the test for the "Earned Income Credit" keyword in a simplified script: python3 kokkai_api.py --keywords "income_deduction" --start-date "2006-01-01" --end-date "2023-12-31" --output "income_deduction.csv" Results: 1058 records successfully retrieved Simpler code while maintaining the same functionality as before

I tried to put it in visualization, but trouble.

  • →I was wrong about the field information required by the visualization system in the latter stage.

As of 3/7, the latest data from the Library of Congress' Proceedings Search is 2/19, and there appears to be a 2-week delay

Published.


This page is auto-translated from /nishio/国会議事録可視化 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.


(C)NISHIO Hirokazu / Converted from Markdown (en)
Source: [GitHub] / [Scrapbox]