Avalilação PREreview de Heterogeneity in charitable giving preferences from over 2 million decisions worldwide
- Publicado
- DOI
- 10.5281/zenodo.20826544
- Licença
- CC BY 4.0
Review of “Heterogeneity in charitable giving preferences from over 2 million decisions worldwide”, https://osf.io/preprints/osf/28g3z_v2 by Rene Bekkers, r.bekkers@vu.nl, http://orcid.org/0000-0002-4403-7222
24 June 2026
Preliminaries
To be published on PreReview, https://prereview.org/preprints/doi-10.31219-osf.io-28g3z_v2/
In this review I took a developmental approach, seeking to identify the strengths of the research, and to suggest ways to mend weaknesses. I did not use any form of artificial intelligence preparing this review.
As a signatory to the Peer Review Openness initiative, I’ve deliberately signed my review.
My expertise is on charitable giving, not so much on experimental philosophy. I’ve conducted scenario experiments measuring donation intentions and social norms on charitable giving.
As a condition to accept the invitation to review this paper for a journal, I’ve requested access to the data and code for this paper. The authors have provided a view only link to an OSF repository containing the data and code. A difficulty I encountered attempting to verify the results reported in the manuscript with the data and code provided was that the readme file did not include a description of the code files. It was unclear to me which files of code do what, and in which order they should be run.
The repository also contained two preregistrations for the study on Aspredicted, which I’ve also reviewed.
Overall assessment
The manuscript reports fascinating results on tradeoffs between helping different individuals from an impressively large number of decisions in an online experiment with a large number of conditions and participants recruited through a website. The main strengths of the study are the very large sample of countries and individuals, and the number of conditions included in the experimental design. Weaknesses of the manuscript are the constraints on generality, the lack of realism in the decision situations, the lack of theory and hypotheses, the divergence between hypotheses stated in the preregistrations and the analyses reported in the manuscript, confounding, a lack of manipulation checks, and selective citation of previous research.
Strengths Sample size: the very large sample size allows for precise estimates of main effects and interactions between scenario characteristics as well as between scenario and participant characteristics.
Generalization across countries: the availability of data from participants in a large number of countries allows for estimates of generalizability and country level heterogeneity in treatment effects.
Multiple conditions: the design of the game includes a variety of scenario characteristics, allowing for simultaneous tests of multiple conditions as well as interactions between them.
Incentivized tasks: the manuscript mentions that participants in the first weeks were incentivized financially. SI 1.3 explains that in the first three weeks, two rewards of $1000 were offered.
Weaknesses Sample composition: it is not clear which population(s) the participants represent. The manuscript only mentions in the last paragraph of the main text that participants are a non-random, self-selected group. Those interested in effective altruism tend to be young – half of participants in the survey are younger than 24 – have completed higher levels of education, are more prosocial in terms of previous levels of giving and volunteering, more rational in the sense of striving for consistency in decision making, and have higher intelligence. Though the manuscript presents some statistics on the distribution of these characteristics in other populations, statistical tests are lacking. For the countries with the largest numbers of observations, comparisons with population characteristics from census or global survey data would be helpful to indicate how selective the study participants are.
Sample recruitment: while the sample size is impressive, the way participants were recruited is likely to have affected their decisions. The manuscript reports that a large proportion of participants were likely to have completed moral dilemmas before they entered the current study. It is not clear which message the participants read on the Moral Machine website before they could enter the current study. I recommend that the manuscript explains in the SI section 1.4 what that message was. The MIT news article mentions that the My Goodness game “uses hypothetical choices and real cash prizes to educate people on how to make their most of their generosity”. This headline is likely to attract interest from those who would like to make the most of their generosity – that is, people who consider themselves to be generous, and are interested in a greater impact of their donations.
Having previously considered moral responsibilities which lives to save, participants are likely to also view the decisions to allocate funds in the scenarios in the current study from a moral perspective. Furthermore, the roughly 8% of all 2.3 million participants in the Moral Machine that continued to the current study is likely to be a selective group with above average interest in charitable giving and prosocial motivation.
The 65,583 survey participants, in turn, are likely to be a selective quarter of all 257,019 participants in the My Goodness game. The invitation “Would you like to help us better understand your judgement?” leads to further selectivity with respect to prosocial motivation and interest in moral decision making.
The manuscript carefully avoids generalizing statements about people, and instead reports about the choices of participants. Yet the manuscript could clarify better – and earlier – that the choices analyzed do not reflect human universals, in contrast to the suggestion in the abstract of a “cross-cultural universality of high-level altruism”. The level of altruism in the decisions is likely to be inflated by the sample composition and previous tasks completed by participants. A constraints on generality statement (Simons, Shoda, & Lindsay, 2017) would be in order.
Hypothetical scenarios. A disadvantage of the scenarios is that they are hypothetical situations. Except for the 18,820 decisions in the 941 sessions in the first three weeks, the choices participants made did not have behavioral consequences, neither in terms of costs to themselves, nor in terms of benefits for others. The manuscript could be improved by formally testing the differences between effects of conditions in choices in the first three weeks and in choices in the following weeks. Such a test should control for participant characteristics, as they are likely to differ between early and later participants.
Based on a strong line of thinking in economics (Cherry, Frykblom & Shogren, 2002) it may be argued that the degree of altruism expressed towards strangers is likely to have been overestimated because the choices participants made had no behavioral consequences. The finding that participants strongly preferred allocating money to themselves rather than to save the life of a single stranger even when these preferences had no consequences illustrates the power of self-interest. With the results of the statistical test for differences between the first 3 and remaining weeks, the manuscript could discuss these claims.
A second problem that the manuscript acknowledges in the limitation section is their limited realism. In practice, people rarely encounter situations in which they have to choose between allocating money to themselves or saving the life of a single relative, or between saving different numbers of strangers. Indeed a benefit of the design of the decisions participants made is that effects of multiple conditions can be studied on how participants think about them, but what these insights imply if they do not correspond to real life decisions about donations is not clear. This limitation seems beyond remedy.
Confounding circumstances. A prominent condition in charitable decision making is recipient need. Typically, in charitable appeals the needs of strangers far away are many times greater - at greater opportunity costs in terms of survival, health, and wellbeing - than the needs of oneself. Living conditions of others in need are likely to be systematically worse than living conditions for participants, yet these conditions are not counterbalanced or controlled in the choice situations.
When evaluating the allocation of funds to oneself vs another person or other persons, a clean comparison involves oneself vs a relative vs a stranger at the same level of need, i.e. for the same product, service or purchase. By specifying meals, medication, clean water, and victim support, the scenarios give participants some information about the needs of recipients. However, the scenarios do not specify the direness of the conditions in which recipients need these tangibles. In the absence of such information, participants are likely to have relied on their own impressions or expectations of the needs of others.
Other conditions that were not controlled but are likely to be relevant are the availability of other resources and helpers that the recipient could call upon, the recipient's well-being, and conditions in the recipient environment. For instance, purchasing parity is not controlled. Because potential donors reside in wealthier countries, the same budget does a lot more good for others far away than for potential donors.
Not controlling for these confounders is likely to further reduce the generosity towards relatives and particularly to strangers. Also the comparisons between helping strangers and relatives are likely to be biased. Relatives of participants will usually be in better circumstances than recipients of charities.
A subsample of scenarios force a choice between directly helping a single individual and helping a larger number of individuals through charities as intermediaries. In such comparisons perceptions of these intermediaries are likely to affect decisions. To some extent these perceptions may have been influenced by the brands named in the scenarios. However, the names of organizations are less relevant for non-US participants – few people outside the US know the Schistosomiasis Control Initiative. SI 2.1.3 mentions that the names of the ‘low recognition’ charities were “deemed highly effective”. As a result, brand awareness is confounded by deemed effectiveness.
Finally, helping more others was presented as a costless option – just switch to charity B to save more lives. In practice, for a given charity, saving more lives will be more costly than saving fewer lives, all else constant, even with economies of scale.
Because it is not feasible to unconfound the manipulations, it would be good if the manuscript acknowledges these limitations.
Unmeasured mechanisms. Arguments about why certain conditions could affect allocations are not tested with manipulation checks or other information. For brand recognition, for instance, the argument that a higher proportion of participants recognize RAINN than Development Media International is unlikely to hold outside the US – and perhaps even for large groups within the US. However, the data analyzed do not have information about the recognition of the names presented to participants. One way to get around this and quantify recognition is by using data on the number of donors to the various organizations named. For identified victims, the conventional explanation is that people find it easier to pay attention to a single victim and feel compassion for the victim than for a large group (Hart, Lane & Chinn, 2018; Moche, Karlsson & & Västfjäll, 2024). At the same time, faced with a greater number of potential recipients needing help, participants may get the impression that the impact of their gift on each individual recipient is smaller, and the scale of the problem to be addressed is larger. By design, the decision situations analyzed here exclude the latter possibility. This leaves the greater attention to specific victims and the ‘warm glow’ of identification with the victim as the most likely mechanisms. Unfortunately, the data presented do not include affect or attention measures. Again, the discussion section could identify this limitation.
Interactive treatment effects. The inclusion of a large number of treatment conditions allows for extensive tests of interactions between them. The manuscript reports one such interaction, between recipient identification and age. Other interactions are left unexplored. The manuscript states on page 19 that "We encourage other researchers to further explore causal drivers for the remaining effects." To do so, access to the dataset is required. It would be good if the revised manuscript includes a link to a public repository containing the dataset.
Moderation of conditions by participant characteristics. Main effects of recipient characteristics on willingness to help have often been theorized to be dependent on characteristics of the decision maker - with the most common prediction that similarity would produce more gifts due to liking of one's own social identity (Baldassarri & Abascal, 2020), though for gender opposite effects have also been predicted from a sexual selection framework (Raihani & Smith, 2015). In addition, it is likely that preferences expressed in the choices are related to previous giving behavior, religious affiliation, beliefs about the effectiveness of charities, and the level of community involvement.
Outcome switching and other deviations from the preregistrations. It is very good to see a paragraph in the supplementary materials (S8) explaining deviations from the pre-registrations. In the preregistrations on Aspredicted, the “Identifiable victim effect” and the “mere exposure effect” were in secondary hypotheses. In the paper, they figure prominently. It would be good if the manuscript would mention that these hypotheses were promoted as a result of the larger than expected sample.
S8 does not mention several other deviations from the preregistration. While the preregistration mentions that the consequences of missing data will be investigated, the datafile includes only sessions with complete information. It is unclear how many participants failed to produce complete responses. Furthermore, in the “deliberate ignorance” conditions, participants could click to reveal the information before making a decision. It is unclear what the effect is of clicking to reveal the information, which the preregistration mentions as an external validity check. Finally, the preregistration mentions that all kinds of macro-level correlates will be explored that the Moral Machine experiment paper (Awad et al., 2018) also investigated. However, the manuscript does not report on these correlates. In contrast, it does report an analysis of the IIP scales, which the preregistrations do not mention.
Lack of theory. The manuscript does not mention theories or hypotheses underlying the design of the study. Also, the manuscript does not discuss the theoretical implications of the findings. As the discussion mentions, the results are quite at odds with some of the results conventionally believed to be true. Why were the current results different? The lack of theory may be fitting for a short article in a journal such as Nature Communications, but it is a severely missed opportunity.
Selective literature review. The manuscript presents the empirical results of the study in light of the replication crisis. Given this emphasis, references to the recent reproducibility paper in Nature (Tyner et al., 2026) and replications of studies on the identifiable victim effect (Lesner & Rasmussen, 2014; Thomas‐Walters & Raihani, 2017; Hart, Lane & Chinn, 2018; Moche, Karlsson & Västfjäll, 2024; Meier, 2025) would be apt to include in the manuscript, especially in light of the results. The cited Lee & Feeley (2016) meta-analysis is now 10 years old, and does not reflect the current state of knowledge. Another systematic review is Butts et al (2019). More importantly, subsequent scholarship has reached very different conclusions than these reviews (see Erlandsson et al., 2024).
Other relevant literatures that the manuscript could mention concern behavior in social dilemmas, and particularly dictator games. Participant clustering. The data provided contain a session indicator, but not a participant indicator. The same people may have participated multiple times. They may be identified by screening survey responses of participants for strongly similar patterns in background characteristics. It would be good to consider ways to include only the first session completed by participants.
Small things. Figure 1 does not include the names of organizations in the nutrition, water and assault scenarios. It would be good to include them. Multiple footnotes and references in the manuscript link to a private paperpile archive. It would be good to replace those links with links to DOIs.
References
Awad, E., Dsouza, S., Kim, R. et al. (2018). The Moral Machine experiment. Nature, 563, 59–64. https://doi.org/10.1038/s41586-018-0637-6
Baldassarri, D., & Abascal, M. (2020). Diversity and prosocial behavior. Science, 369(6508), 1183-1187. https://doi.org/10.1126/science.abb2432
Butts, M. M., Lunt, D. C., Freling, T. L., & Gabriel, A. S. (2019). Helping one or helping many? A theoretical integration and meta-analytic review of the compassion fade literature. Organizational Behavior and Human Decision Processes, 151, 16-33. https://doi.org/10.1016/j.obhdp.2018.12.006
Cherry, T. L., Frykblom, P., & Shogren, J. F. (2002). Hardnose the dictator. American Economic Review, 92(4), 1218-1221. https://doi.org/10.1257/00028280260344740
Erlandsson, A., Dickert, S., Moche, H., Västfjäll, D., & Chapman, C. (2024). Beneficiary effects in prosocial decision making: Understanding unequal valuations of lives. European Review of Social Psychology, 35(2), 293-340. https://doi.org/10.1080/10463283.2023.2272238
Hart, P. S., Lane, D., & Chinn, S. (2018). The elusive power of the individual victim: Failure to find a difference in the effectiveness of charitable appeals focused on one compared to many victims. PloS one, 13(7), e0199535. https://doi.org/10.1371/journal.pone.0199535
Meier, D. S. (2025). Compassion for All: Real-World Online Donations Contradict Compassion Fade. Nonprofit and Voluntary Sector Quarterly, 54(2), 267-301. https://doi.org/10.1177/08997640241255707
Moche, H., Karlsson, H., & Västfjäll, D. (2024). Victim identifiability, number of victims, and unit asking in charitable giving. Plos one, 19(3), e0300863. https://doi.org/10.1371/journal.pone.0300863
Raihani, N. J., & Smith, S. (2015). Competitive helping in online giving. Current Biology, 25(9), 1183-1186. https://doi.org/10.1016/j.cub.2015.02.042
Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123-1128. https://doi.org/10.1177/1745691617708630
Tyner, A.H., Abatayo, A.L., Daley, M. et al. (2026). Investigating the replicability of the social and behavioural sciences. Nature, 652, 143–150. https://doi.org/10.1038/s41586-025-10078-y
Competing interests
The author declares that they have no competing interests.
Use of Artificial Intelligence (AI)
The author declares that they did not use generative AI to come up with new ideas for their review.