Generative Social Choice

from /blu3mo-public/deliberative-support-systems-research-survey.

v3

- KJ method gathers them into groups and then labels them, this method generates the labels first - LLM allows this "thinking" to be done without having to describe the methodology. - Generate a label and then determine if the participant falls into the set represented by that label - By removing the (opinions of) those who enter the set, we can focus on the opinions of the minority that are not represented by the "majority opinion". - This is similar to the KJ method process of trying to find a connection by collecting only those [disengaged monkeys](/en/disengaged%20monkeys) who did not naturally join a group.

v2

【Gen(Generative Query)】 Role: The Gen query takes a given set of participants S and an integer r as arguments and causes LLM to generate "a statement α such that at least r participants in S have high utility. In essence, this query performs the operation "to generate a new statement that is the best statement that can be satisfied by a given group of participants.

Necessity and Significance:

Traditional social choice only allows participants to choose from a fixed set of candidates, but here the LLM can understand participants' free responses and generate new statements that are "acceptable to participant group S," thus discovering more diverse, intermediate positions and statements that are closer to the participants' potential consensus .
Generative Query makes it possible to retrieve consensus statements and compromise proposals that do not exist in the first place, allowing for the democratic aggregation of opinions on complex issues that have been difficult to achieve in the past.

Examples of goals and execution steps for Gen queries:

At a given step, the system determines the set S of participants who have not yet been satisfied and a parameter r indicating how many "minimum satisfactions" to aim for from that set.
Prompt the LLM with information summarizing the responses previously given by S participants (S's open-ended responses or summaries of existing evaluations) and ask the LLM to "devise a statement that will satisfy at least r of the Ss."
The LLM generates several candidate statements, then uses the Disc query to calculate the utility of those statements to the participants in S, and selects the best statement among them that serves its purpose.

Implementation Devices:

Simply passing all of the S's information to the LLM at once may be difficult due to context window constraints. Also, a single generation does not always produce an optimal solution.
In actual experiments, the "ensemble strategy" is used:
Statement generation with prompts for all members together
Prompt-generated statements for a small representative subset (e.g., 5 or 10 person units)
Statements obtained from groups from which participants with similar opinions were selected using clustering and other techniques.
and so on, multiple proposals are generated by multiple paths, and a Disc query is applied to them to select the statement with the greatest utility.

v1

[Summary This study proposes a new framework, "Generative Social Choice," that combines social choice theory (a theory that aggregates the preferences of an entire society to make decisions) with the generative capabilities of LLMs (large-scale language models). Conventional social choice theories are models that vote and aggregate among a small number of predefined alternatives, making it difficult to deal with free-form opinions and diverse options. In contrast, this study uses LLM, - Generate new textual options (statements) that have not been previously envisioned, using individual free-form opinions as input. - Design a decision process that satisfies strict conditions such as proportional representativeness with theoretical steps that consider the LLM as an "oracle" (ideal referee) - In fact, experiment with LLM to approximate its oracle functionality. We present a two-step method of

This new method summarizes (slates) a large number of free-text majority opinions into a small number of representative statements and demonstrates that they can adequately represent the diverse values of the participants. In a specific experiment, we collected open-ended responses on chatbot personalization from 100 U.S. residents, and used LLM to generate and select five representative opinion statements. A subsequent validation survey of another 100 respondents revealed that 93% of them indicated that their opinions were reflected at the "mostly" or "perfectly" level.

[Commentary The innovation of this study is that it uses the generative capabilities of LLM to transform opinion sets from free text presented by participants into "dynamically generated alternatives" and extract them as a fair slate based on the representativeness conditions (justified proportional representation) that satisfy those alternatives. This makes it possible to aggregate complex and unspecified policy issues and values by majority vote. Research is being conducted on two fronts: theoretical (fairness assurance under the assumption of access to an ideal oracle) and empirical (actual approximate implementation and evaluation using LLMs), and it is anticipated that the availability of more sophisticated models in the future will make it easier and more accurate to ensure representativeness.

[Overall flow This study presents a "democratic" process for summarizing and extracting open-ended participant opinions (text) into representative statements (opinion statements) and selecting them according to a condition called Balanced Justified Representation (BJR) in social choice theory. process.

This process can be divided into the following two steps

Theoretical stage of assuming access to an ideal oracle
- Assume that the LLM is a "perfect oracle" and that the necessary queries can retrieve the "ideal statement for any group". At this stage, assuming access to the oracle, we design a statement selection algorithm that satisfies proportional representativeness (BJR) and mathematically prove its validity.
Approximate implementation and verification using actual LLMs
- In reality, LLM is not a "perfect oracle. Therefore, we will experimentally verify how close to oracle-like behavior can be obtained by implementing an approximate implementation of an ideal query in LLM.

Query (query) type used] This method prepares two types of queries to the LLM.

Discriminative Query (Disc)：
- Given an agent's (participant's) free response text and an arbitrary statement, the LLM will infer how much utility (degree of preference) the agent has for that statement.
- This allows participants to extrapolate (Extrapolate) their preferences from past responses to evaluate new statements that they have not explicitly rated.
Generative Query (Gen)：
- A query that "causes LLM to generate a statement that maximally satisfies a given set of agents S." For example, the LLM generates new textual alternatives to derive statements that maximize the minimum utility among S, or statements with utilities above a certain threshold.
- This gen query allows the system to actively "create" intermediate or eclectic opinions that did not exist beforehand.

Algorithm that satisfies the Balanced Justified Representation (BJR) condition. The paper uses an extended procedure similar to Greedy Approval Voting. The outline is as follows.

Define the set of all participants N and the slate size k (number of summary statements).
Repeat steps:
- Consider a subset S of participants who are not yet represented, and use a Gen query to generate statements that can represent about n/k of them.
- Disc query to evaluate the degree to which the generated statement is supported by the participants in S, and add the statement to the slate.
- Participants who are sufficiently satisfied with the statement are considered "represented" and are excluded from the set S.
We repeat this procedure k times and theoretically prove that selecting k statements yields a slate satisfying BJR.

Implementation

LLMs have the problem of not being able to feed in a large number of participant data at once because of the length limitation of the prompts. For this reason, the proposed method does not necessarily process everyone's data at once, but rather uses sampling and approximation.
Generative Query is difficult to determine the optimal statement in a single LLM call. Therefore, this study uses an "ensemble approach" that generates statement proposals with multiple patterns and selects the statement with the highest score using Discriminative Query.

Evaluation Experiment

In the empirical experiment, 100 U.S. residents responded freely to the question "personalization of chatbots," and five representative statements were subsequently extracted using this method. These five statements were then presented to another 100 respondents, and they were asked to rate the degree to which they matched their opinions. The results showed that 93% of the respondents answered "mostly" or "perfectly".

Summary Infer participant ratings for any statement in Discriminative Query with LLM. Generate new statements suitable for the participant set with LLM in Generative Query. These queries are combined to form a slate that achieves BJR through a Greedy selection algorithm. Approximate implementation using actual LLMs, and experiments confirmed high representativeness.

Thus, the two types of queries using LLMs present a specific method that allows for proportional representation consensus building from free text responses.

This page is auto-translated from [/nishio/Generative Social Choice](https://scrapbox.io/nishio/Generative Social Choice) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.

(C)NISHIO Hirokazu / Converted from Markdown (en)
Source: [GitHub] / [Scrapbox]