Abstract
Punishment as a mechanism for promoting cooperation has been studied
extensively for more than two decades, but its effectiveness remains a matter
of dispute. Here, we examine how punishment's impact varies across cooperative
settings through a large-scale integrative experiment. We vary 14 parameters
that characterize public goods games, sampling 360 experimental conditions and
collecting 147,618 decisions from 7,100 participants. Our results reveal
striking heterogeneity in punishment effectiveness: while punishment
consistently increases contributions, its impact on payoffs (i.e., efficiency)
ranges from dramatically enhancing welfare (up to 43% improvement) to severely
undermining it (up to 44% reduction) depending on the cooperative context. To
characterize these patterns, we developed models that outperformed human
forecasters (laypeople and domain experts) in predicting punishment outcomes in
new experiments. Communication emerged as the most predictive feature, followed
by contribution framing (opt-out vs. opt-in), contribution type (variable vs.
all-or-nothing), game length (number of rounds), peer outcome visibility
(whether participants can see others' earnings), and the availability of a
reward mechanism. Interestingly, however, most of these features interact to
influence punishment effectiveness rather than operating independently. For
example, the extent to which longer games increase the effectiveness of
punishment depends on whether groups can communicate. Together, our results
refocus the debate over punishment from whether or not it "works" to the
specific conditions under which it does and does not work. More broadly, our
study demonstrates how integrative experiments can be combined with machine
learning to uncover generalizable patterns, potentially involving interactions
between multiple features, and help generate novel explanations in complex
social phenomena.