Replicator dynamics in public goods games with reward funds moreCo-authored with Tatsuo Unemi (Soka Univ., Japan); preprint; published in 'Journal of Theoretical Biology', 2011 |
40 views |
Evolutionary Game Theory, Evolution of cooperation (Evolutionary Biology), Altruistic Punishment, Reward, Cultural Evolution, Social Dilemmas, and Reward
Replicator dynamics in public goods games with reward funds
Tatsuya Sasaki1,2,*
Email address: sasakit@iiasa.ac.at Evolution and Ecology Program International Institute for Applied Systems Analysis (IIASA) Schlossplatz 1, Laxenburg A-2361, Austria
2 1
Graduate School of Engineering, Soka University Hachioji, Tokyo 192-8577, Japan
Tatsuo Unemi3
Email address: unemi@iss.soka.ac.jp
3
Department of Information Systems Science, Soka University Hachioji, Tokyo 192-8577, Japan
*Corresponding Author at:
Evolution and Ecology Program International Institute for Applied Systems Analysis (IIASA) Schlossplatz 1, Laxenburg A-2361, Austria Tel: +43-2236-807 Fax: +43-2236-71313
For the submission in revised form, 23 July 2011 This has been published by the Journal of Theoretical Biology 287, 21 October 2011, Pages 109â114. Epub 3 August 2011.
1
Abstract
Which punishment or rewards are most effective at maintaining cooperation in public goods interactions and deterring defectors who are willing to freeload on othersâ contribution? The sanction system is itself a public good and can cause problematic âsecond-order free ridersâ who do not contribute to the provisions of the sanctions and thus may subvert the cooperation supported by sanctioning. Recent studies have shown that public goods games with punishment can lead to a coercion-based regime if participation in the game is optional. Here, we reveal that even with compulsory participation, rewards can maintain cooperation within an infinitely large population. We consider three strategies for players in a standard public goods game: to be a cooperator or a defector in a standard public goods game, or to be a rewarder who contributes to the public good and to a fund that rewards players who contribute during the game. Cooperators do not contribute to the reward fund and are therefore classified as second-order free riders. The replicator dynamics for the three strategies exhibit a rock-scissors-paper cycle, and can be analyzed fully, despite the fact that the expected payoffs are nonlinear. The model does not require repeated interaction, spatial structure, group selection, or reputation. We also discuss a simple method for second-order sanctions, which can lead to a globally stable state where 100% of the population are rewarders. Keywords: evolutionary game theory; cooperation; sanction; second-order social dilemma; rock-scissors-paper cycle
2
1. Introduction An enduring conundrum in the biological and social sciences is how cooperation can emerge and be maintained in a sizable group containing exploiters. The conundrum is the so-called social dilemma [1, 2] because its nature is described as follows: groups of cooperators outperform groups of defectors, whereas in a mixed group defectors always outperform cooperators. This represents common conflicts between a social optimum and individual interests very well, and it has traditionally been modeled as the public goods game in many experimental and theoretical studies [3]. In the public goods game (PGG), cooperators confer benefits on others with some cost to themselves, whereas defectors exploit the benefits without such contribution to others. Defection is the selfish choice that results in a decrease in the total benefit to the group, but defection is rational from the evolutionary viewpoint because of a higher individual payoff, with no cost. Thus, natural selection will often drive elimination of cooperation. Classical and evolutionary game studies have, however, identified supportive mechanisms under which cooperation is nonetheless sustained, such as repeated interactions [4, 5], reputation [6, 7], spatial structure [8, 9], and group selection [10, 11]. Punishment of defectors and rewards for cooperators are also major factors that maintain cooperation between self-interested individuals, as suggested by growing experimental and theoretical evidence [12â32]. However, sanctions are costly, and therefore pose the next conundrum: how costly sanctioning can subsist in the presence of those who freeload on othersâ contributions to sanctions. This issue is the âsecond-order social dilemmaâ [12, 14], which has been particularly well addressed, in the case of costly punishment. One of possible solutions is to punish second-order freeloaders as well [13, 15, 24, 32]. At the same time, there is an issue of how costly punishment can emerge [21, 33]. In a population of defectors, a rare punisher suffers enormous costs because of the need to continuously punish defectors. However, recent studies have shown that punishment-based
3
cooperation can emerge if participation in the PGG is optional rather than compulsory [20, 21, 26, 32]. We note that optional participation is another way to maintain cooperation [33â 39], which can lead to ârock-scissors-paperâ-type cyclic domination, well-known in evolutionary game theory [40, 41], among cooperators, defectors, and loners who earn a small but fixed payoff, instead of participating in the PGG [37â39]. Interestingly, Sigmund et al. [32] have found that when it comes to punishing second-order freeloaders, natural selection favors pool-punishment rather than peer-punishment. Peer-punishment is a sanctioning technique which has been the most widely used form of punishment in PGGs in which players decide whether to impose fines on exploiters after the PGG. By contrast, in pool-punishment, players have to decide whether to contribute to a punishment fund before the PGG [14], analogous to forming a volunteer band of watchmen in advance. While optional participation could be required for a population to evolve from a stalemate where everybody defects to a coercion-based regime, there problems associated with opting out of a public goods project, such as global environmental issues, remain [21]. When participation is compulsory, peer-rewarding can cause cyclical dynamics in infinite populations if reputation alone is important (for pair-wise interactions see Sigmund et al. [16], for interactions of arbitrary size see Hauert [30]). In contrast, reputation is given less weight in finite populations [29]. In this work, we explore the effects of pool-rewarding in compulsory PGGs with infinite populations. Similar to pool-punishment, players first decide whether to contribute to a reward fund. After a one-shot PGG among all group members, the common fund is divided equally among those players who contributed, irrespective of their contribution to the fund. While the list of real-world examples of reward funds is too long to list, we shall consider a generous voluntary fund, which may be threatened with collapse by second-order freeloaders. We propose a minimalistic model for infinite populations that does not require repeated interactions, reputation, spatial structure, group selection, or optional participation. We also compare two types of benefit-sharing models, which differ on whether or not a
4
contributor in the PGG may oneself benefit, thus corresponding to âweak altruismâ and âstrong altruismâ [42, 43]. The evolution of cooperation is investigated by means of the replicator dynamics [40, 41]. 2. The game-theoretical model group of í µí± players is randomly formed from the population (where í µí± â¥ 2). The PGG is contributions are then distributed in the following different ways: in the case of weak altruism (WA), the contribution, í µí±1, will be multiplied by í µí±1 > 1 and then equally shared Consider an infinitely large, well-mixed population of constant size. From time to time, a
of a one-shot version. Each player is asked to contribute í µí±1 > 0 to the public good. The
and the PGG with WA, also for í µí±1 < í µí±. Indeed, in each case, a player that does not
among all í µí± players in the group, but in the case of strong altruism (SA), it will be shared among í µí± â 1 other co-players only. In both cases, if all group members contribute, they í µí±1 (1 â í µí±1 /í µí±) > 0 with WA, no matter what the other players do. For the PGG with WA,
obtain a payoff of (í µí±1 â 1)í µí±1 > 0. The PGG with SA is a social dilemma for any rate of í µí±1, contribute to the public goods can get an improved payoff by í µí±1 with SA, and by
the benefits by switching to a contributor.
we assume í µí±1 < í µí±, as the social dilemma would otherwise be completely relaxed due to
Next, we introduce the following pool-rewarding mechanism. Before participating in the
í µí±2 > 1, and after the PGG distributed equally to those who have contributed to the public good, if any. We consider the following three strategies: rewarders (R) who contribute both to the PGG fund, and defectors (D) who contribute neither to the PGG nor to the reward. If all í µí±
5
behaviors in the PGG. The integrated contribution to the reward fund is multiplied by
PGG, each player is first asked to contribute í µí±2 > 0 to a fund to reward cooperative
and to the reward fund, cooperators (C) who contribute to the PGG but not to the reward
second-order social dilemma for í µí±2 < í µí± because withdrawing oneâs contribution to the and if all of them are C-players, they obtain nothing. The rewarding system is a reward fund can increase individual payoff by í µí±2 (1 â í µí±2 /í µí±) > 0. We note that pool-rewarding itself is another case of weak altruism: an R-player is allowed to obtain a return from contributing to the reward fund. We do not eliminate a return for individuals who choose to contribute to rewards. R-players would be more likely to evolve with it than without it. In the latter case D-players dominate (see Appendix A.1 for details). Nevertheless, it is not clear whether or not such weakly altruistic, reward system can subsist in the presence of second-order freeloaders. Indeed, the funding stage is set up before the PGG and thus R-players cannot avoid the risk of being exploited by C-players. We denote the expected payoff values for R-, C-, and D-players with í µí±R , í µí±C , and í µí±D , respectively. The frequencies of the three strategies are expressed as í µí±¥ , í µí±¦ , and í µí±§ The strategyâs expected payoff is supposed to be the sum of the payoff from the PGG and � � � í µí±¥Ì = í µí±¥(í µí±R â í µí±), í µí±¦Ì = í µí±¦(í µí±C â í µí±), í µí±§Ì = í µí±§(í µí±D â í µí±). from the reward fund. The replicator equations are written as (1) � (í µí±¥ + í µí±¦ + í µí±§ = 1). The average payoff for the population is given by í µí± = í µí±¥í µí±R + í µí±¦í µí±C + í µí±§í µí±D .
contributors in the PGG are R-players, they each obtain a net reward of (í µí±2 â 1)í µí±2 > 0,
group with í µí± contributors obtains a benefit of í µí±1 í µí±1 í µí±/í µí± (0 â¤ í µí± â¤ í µí± â 1). Hence, the
1 í µí±D = âí µí±â1 � í µí±=0
We first calculate the expected payoffs from the PGG. In the case of WA, a D-player in a
í µí± í µí± í µí± í µí± â 1 (1 � â í µí±§)í µí± í µí±§ í µí±âí µí±â1 1í µí±1 í µí± 1
expected payoff is given by = í µí±1 í µí±1 �1 â í µí±ï¿½ (1 â í µí±§),
where �
í µí±1 í µí±1 í µí±/(í µí± â 1), and calculating the expected payoff as in Eq. (2a),
6
PGG are contributors. In the case of SA, a D-player in the group obtains a benefit of
í µí± â 1 (1 � â í µí±§)í µí± í µí±§ í µí±âí µí±â1 is the probability that í µí± of í µí± â 1 co-players in the í µí± (2a)
1 1 Both the expected payoffs for R- and C-players (denoted by í µí±R , resp. í µí±C ) are reduced
1 from í µí±D , by the cost for a contributor Ï: í µí¼ = í µí±1 (1 â í µí±1âí µí± ) in the case of WA and
í µí¼ = í µí±1 in the case of SA.
1 í µí±D = í µí±1 í µí±1 (1 â í µí±§).
(2b)
reward of í µí±2 í µí±2 í µí±R /í µí± (0 â¤ í µí±R â¤ í µí± â 1). Hence, the expected reward for a C-player in a group with í µí± contributors is = í µí±2 í µí±2 �1 â í µí±ï¿½ �1âí µí±§ï¿½,
1 í µí±¥
2 Regarding the reward system, the expected payoff for D-players is í µí±D = 0. A C-player in a
group with í µí± contributors and í µí±R R-players (and thus í µí± â í µí±R C-players) receives a í µí± â 1 í µí±¥ í µí±R í µí±¦ í µí±âí µí±R â1 í µí±2 í µí±2 í µí±R 2 í µí±â1 � �1âí µí±§ï¿½ �1âí µí±§ï¿½ , í µí±C (í µí±) = âí µí±R =0 � í µí± í µí±R
2 2 net reward for an R-player, í µí±R , is reduced from í µí±C by
Among í µí± contributors, switching from R to C yields í µí±2 (1 â í µí±2 âí µí±). Thus, the expected (4) í µí±2 âí µí± ï¿½ í µí±=1 í µí°¹(í µí±§) has a unique root í µí±§Ì in the open interval (0,1) if, and only if, 1 < í µí±2 < í µí±, because (5) í µí°¹(í µí±§) is monotonic, í µí°¹(0) = í µí±2 (1 â í µí±2 âí µí±) > 0, and í µí°¹(1) = í µí±2 (1 â í µí±2 ) < 0. Therefore, the advantage C-players have over R-players will change from positive to negative as í µí±§ increases across í µí±§Ì .
1 2 í µí±D = í µí±D + í µí±D , and obtain a simple expression for the average payoff for the population í µí±2 í µí±2 1âí µí±§ í µí± í µí± â 1 (1 � â í µí±§)í µí±â1 í µí±§ í µí±âí µí± ï¿½1 â í µí± ï¿½ = í µí±2 �1 â í µí± 1âí µí±§ � í µí± â 1
2 í µí±C = âí µí± ï¿½ í µí±=1
contributors are R-players. Consequently, the expected reward for a C-player is = í µí±2 í µí±2 �1 â
1âí µí±§ í µí±
where �
í µí± â 1 í µí±¥ í µí±R í µí±¦ í µí±âí µí±R â1 � �1âí µí±§ï¿½ �1âí µí±§ï¿½ is the probability that í µí±R of the other í µí± â 1 í µí±R (3) í µí± â 1 (1 2 � â í µí±§)í µí±â1 í µí±§ í µí±âí µí± í µí±C (í µí±) í µí± â 1
í µí±(1âí µí±§)
� �1âí µí±§ï¿½.
í µí±¥
=â¶ í µí°¹(í µí±§).
1 2 1 2 Integrating the above results, we can determine that í µí±R = í µí±R + í µí±R , í µí±C = í µí±C + í µí±C , and
7
both for the WA and SA cases. 3. Dynamics
� í µí± = í µí±1 (í µí±1 â 1)(1 â í µí±§) + í µí±2 (í µí±2 â 1)í µí±¥,
(6)
{(í µí±¥, í µí±¦, í µí±§): í µí±¥, í µí±¦, í µí±§ ⥠0, í µí±¥ + í µí±¦ + í µí±§ = 1}. The three homogeneous states in which 100% of the the boundary of í µí±3 for non-degenerate cases. Indeed, on the edge C-D: í µí±¥ = 0, í µí±§Ì = population are R-players (í µí±¥ = 1), C-players (í µí±¦ = 1), and D-players (í µí±§ = 1) correspond to
The evolutionary dynamics of the three strategies take place in the state space í µí±3 =
obviously fixed points for the replicator system Eq. (1). There are no other fixed points on (í µí±D â í µí±C )í µí±§(1 â í µí±§) = í µí¼í µí±§(1 â í µí±§) > 0, where í µí¼ = í µí±1 (1 â í µí±1âí µí± ) in the case of WA and D. On the edge R-C: í µí±§ = 0 and on the edge D-R: í µí±¦ = 0, resulting in í µí±¦Ì = (í µí±C â and í µí±¥Ì = (í µí±R â í µí±D )í µí±¥(1 â í µí±¥) = [í µí±2 (í µí±2 â 1) â í µí¼]í µí±¥(1 â í µí±¥), respectively. The evolution on both edges is unidirectional and its direction
three vertices of the simplex í µí±3 (which we denote by R, C, and D, respectively). These are í µí¼ = í µí±1 in the case of SA. Thus, the evolution on the edge C-D is unidirectional from C to
í µí±R )í µí±¦(1 â í µí±¦) = í µí±2 (1 â í µí±2 âí µí±)í µí±¦(1 â í µí±¦) and í µí¼, respectively.
depends on the magnitude of the relationship between í µí±2 and í µí±, and between í µí±2 (í µí±2 â 1)
í µí±¥â(1 â í µí±§), which represents the fraction of contributors in the PGG that are also rewarders. í µí±Ì = â (1âí µí±§)2 (í µí±C â í µí±R ) = âí µí±(1 â í µí±)í µí°¹(í µí±§). This yields í µí±§Ì = âí µí±§(1 â í µí±§)[í µí±2 (í µí±2 â 1)í µí± â í µí¼]. 3.1. The global attractor D
í µí±¥í µí±¦
To analyze the dynamics in the interior of í µí±3 , let us introduce a new variable í µí± = � Substituting í µí±¥ = í µí±(1 â í µí±§) and Eq. (6) into í µí±§Ì = í µí±§(í µí±D â í µí±) yields (7)
(8)
8
to D. Eq. (8) yields í µí±§Ì > 0 in the interior of í µí±3 . Thus, there is no interior fixed point and the direction of evolution on the edge R-C is from R to C; if í µí±2 > í µí± and otherwise, it is í µí±2 < í µí±, the edge is separated into an unstable segment (0 â¤ í µí±§ < í µí±§Ì ) and a stable one that if í µí±2 < 1, then í µí±2 (í µí±2 â 1) â í µí¼ < 0 holds. In the boundary case that í µí±2 (í µí±2 â 1) â
all interior orbits converge to the vertex D, which is a global attractor (Fig. 1a). If í µí±2 < í µí±, í µí¼ = 0, í µí±§Ì = 0 holds when í µí± = 1 and thus, the edge D-R is a line of fixed points. If
Supposing í µí±2 (í µí±2 â 1) â í µí¼ < 0, then the direction of evolution on the edge D-R is from R
from C to R; and when í µí±2 = í µí±, the edge R-C consists of unstable fixed points. We note
drift and occasional invasion of the missing C-player will eventually send the state within the stable segment to the vertex D, in the long run. 3.2. The global attractor R
stable segment (Fig. 1b). If í µí±2 â¥ í µí±, then the edge D-R has no unstable segment. Random
(í µí±§Ì < í µí±§ ⤠1). Since í µí±§Ì > 0 holds in the interior of í µí±3 , all interior orbits converge to the
Supposing í µí±2 (í µí±2 â 1) â í µí¼ > 0 and í µí±2 > í µí±, then the direction of evolution on the edge D-R is from D to R, and from C to R on the edge R-C. The fact that í µí°¹(í µí±§) < 0 in the open (0 â¤ í µí±¥ < í µí±¥RC ) and a stable one (í µí±¥RC < í µí±¥ ⤠1), where í µí±¥RC is given by í µí¼â[í µí±2 (í µí±2 â 1)] as then the edge R-C is a line of fixed points, which consists of an unstable segment
a non-trivial solution of Eq. (8). The fact that all interior states satisfy í µí±¥Ì > 0 leads the population to evolve towards the stable segment. Thus, random drift and occasional invasion of the missing D-player will eventually bring the population to the vertex R, in the long run. 3.3. The mixture equilibrium of the three strategies
all interior orbits converge to the vertex R, which is a global attractor (Fig. 2). If í µí±2 = í µí±,
interval (0,1) yields í µí±¥Ì > 0 in the interior of í µí±3 . Thus, there is no interior fixed point and
9
Ì í µí±§Ì of í µí°¹(í µí±§) and 0 < í µí± â¶= í µí¼â[í µí±2 (í µí±2 â 1)] < 1. From Eqs. (7) and (8), we see that there is a unique interior fixed point í µí± = (í µí±¥ í µí±¦ í µí±§Ì ), with �, �, Ì Ì í µí±¥ = í µí±(1 â í µí±§Ì ), í µí±¦ = (1 â í µí±)(1 â í µí±§Ì ). � � (9) The mixture equilibrium, Q, is a center, i.e., it is neutrally stable and surrounded by closed Ì unique fixed point (í µí±, í µí±§Ì ) corresponding to Q (see Appendix A.2 and [38] for details).
form a heteroclinic cycle of a rock-scissors-paper type. We now have a unique interior root
edge D-R is from D to R, and from R to C on the edge R-C. Thus, the three edges of í µí±3
Supposing that í µí±2 (í µí±2 â 1) â í µí¼ > 0 and 1 < í µí±2 < í µí±, the direction of evolution on the
orbits that fill the interior of í µí±3 (Fig. 3). This results because Eqs. (7) and (8) can be Given í µí±1, í µí±1, and í µí±, which are all original parameters for the PGG, the location of Q can
expressed in the form of a Hamiltonian system, H, and now H has a strict maximum at the
Ì line í µí±¦ = (1âí µí± â 1)í µí±¥, independent of the group size, í µí±. As í µí± increases, Q moves toward í µí± = 2. In other extreme cases, where í µí±2 = 1, í µí±2 = í µí±, í µí±2 (í µí±2 â 1) = í µí¼, and í µí±2 = â, Q arrives at the vertex D, the edges R-C, D-R, and C-D, respectively. 4. Discussion Conflict between contributors and freeloaders in public goods interactions is inevitable. How can we avoid conflict between contributors and freeloaders? An effective solution is to set up a reward fund for cooperative behaviors. The key conditions for the reward system necessary to maintain cooperation with free riders in public goods games (PGGs) are given where í µí¼ = í µí±1 (1 â í µí±1âí µí±) in the case of weak altruism and í µí¼ = í µí±1 in the case of strong (10)
10
be determined by the remaining parameters, í µí±2 and í µí±2 . According to Eq. (9), Q lies on the the vertex D along the line í µí±¦ and í µí±§Ì â 1 as í µí± â â. On the other hand, as í µí± decreases, Q moves in the opposite direction and í µí±§Ì decreases to 2âí µí±2 â 1 > 0, which occurs when
í µí±2 (í µí±2 â 1) > í µí¼, by
altruism. Eq. (10) means that the optimum group reward should exceed the cost for a
contributor in the PGG, which is relaxed by a self-returning benefit of í µí±1 í µí±1âí µí± in the case of weak altruism. In infinite populations, it has been determined that peer-rewarding is a potent motivator, but only if reputation is important [16, 30]. However, in pool-rewarding, this is not the case. With such attractive rewards, cooperative investments in both the PGG rewarding system, i.e., for í µí±2 < í µí± . In the case, the replicator dynamics exhibit a and the reward fund can subsist, even when second-order freeloaders can dominate the
rock-scissors-paper cycle among the three strategies: defectors who never contribute (first-order freeloaders), cooperators who contribute only in the PGG (second-order freeloaders), and rewarders who contribute to both. The cyclical evolutionary scenario can be described as follows. If most players are rewarders, the reward system is actually a second-order social dilemma and thus
cooperators spread. If cooperators are prevalent, it is better to become a defector due to the social dilemma. If most players are defectors, the number of beneficiaries of the reward is usually small enough to subvert cooperator dominance over rewarders, and thus the number of rewarders increases. If the number of rewarders increases sufficiently, then the second-order dilemma returns. In this scenario, traditional defectors play a pivotal role in maintaining the cyclic domination among the three strategies. The moderate advantage defectors have over cooperators, given by Ï, prevents the second-order dilemma from eliminating rewarders and then ensures that rewarders, not cooperators, dominate. Global environmental and energy issues often appear to be compulsory public goods projects, such that in the short-term cooperation will yield only very little benefit and the social optimum is not to cooperate. The situation is not a social dilemma, and has thus our model, this may correspond to the case where 0 â¤ í µí±1 < 1. We remark that the results remained outside the scope of studies on the evolution of cooperation in large groups. In shown hold even when 0 â¤ í µí±1 < 1, and thus pool-rewarding is applicable to a broader range of public goods interactions.
11
earlier public goods game with optional participation [37, 38, 39]. Indeed, the PGG
taking part in another PGG with a cost of í µí±2 and a multiplier of í µí±2 . This is just an implementation of the inverse form of the lonerâs option. A fascinating extension of this work is to consider second-order sanctions [13, 15, 24, 32]. Indeed, in our model, it looks practical for the rewarding system to mete out punishment on cooperators (second-order freeloaders) in such a way that will reduce rewards for those [12]. Let us see how, for instance, reducing rewards to cooperators by í µí±% changes the
degenerates into a game in which there is no longer benefit from contribution í µí±1. Each player therefore seems to have the option to avoid the participation fee of í µí±1, instead of
We note that in the extreme case where í µí±1 = 0, our model is significantly similar to an
dynamics. According to preliminary numerical simulations, the existing interior fixed point Q is destabilized (Fig. 4), and for discount rates a higher than a threshold value, the population can converge to a state of 100% rewarders, irrespective of the initial conditions cooperators and rewarders, enters the state space í µí±3 and is unstable within the (Fig. 4b). As increasing a crosses the threshold, a new mixture equilibrium P, of rewarder-cooperator boundary (see Appendix A.3 for details). If defectors (first-order freeloaders) are absent, the population cannot avoid the resulting coordination problem: depending on the initial condition, the population evolves to become either 100% rewarders or 100% cooperators. Otherwise, interestingly, the population can make an end run around the bistability and establish the social optimum. It would be a rather intriguing issue for future research to theoretically analyze the result that reward-based cooperation will necessarily becomes globally stable, whenever it cannot be invaded by second-order freeloader. By contrast, in the case of pool-punishment, punishment-based cooperation can never become globally stable, even if second-order sanctions are assumed, because a state of 100% first-order freeloaders remains stable [44]. One important issue we left out is the effects of economies and diseconomies of scale on the provision of sanctions. So far we have focused on linear cost-benefit functions for
12
rewarding, whereby any group of rewarders generates the same per capita group benefit. According to Mathew and Boyd [33], the existing interior fixed point of the optional public goods game becomes an attractor for decreasing returns and a repeller for increasing returns. In practice, the rich dynamics afforded by scale would provide many options for the proper design of sanctioning systems to support the evolution of cooperation.
13
Appendix A.1. The strongly altruistic rewarding We here turn to a strongly altruistic variant of pool-rewarding, in which the rewards resulting from an R-player will be shared among other contributors only. We assume that if there exists no other contributor, the investment to the incentive from a single R-player will
2 í µí±C = í µí±2 í µí±2 �1âí µí±§ï¿½ (1 â í µí±§ í µí±â1 ), í µí±¥
be exactly refunded to her. The expected reward for a C-player turns into
2 and that for an R-player is reduced from í µí±C by the expected incentive cost í µí±2 (1 â í µí±§ í µí±â1 ).
í µí±Ì = âí µí±2 í µí±(1 â í µí±)(1 â í µí±§ í µí±â1 ), Eqs. (7) and (8) turn into
í µí±§Ì = âí µí±§(1 â í µí±§)[í µí±2 (í µí±2 â 1)í µí±(1 â í µí±§ í µí±â1 ) â í µí¼]. converge to the vertex D.
point. If í µí±2 (í µí±2 â 1) â í µí¼ â¤ 0, then í µí±§Ì > 0 holds in int í µí±3 , and thus, all interior orbits If í µí±2 (í µí±2 â 1) â í µí¼ > 0, the system has a new equilibrium at (í µí±, í µí±§) = �1, �1 â
í µí¼ í µí±â1 � � í µí±2 (í µí±2 â1)
1
Since í µí±Ì is negative in the interior of the state space í µí±3 , int í µí±3, there is no interior fixed
a saddle. We consider the z-isocline that is the set where í µí±§Ì = 0: in int í µí±3, this is the set where í µí± = í µí± fixed point and the point (í µí±, í µí±§) = ï¿½í µí±
í µí¼ . (í µí±2 â1)(1âí µí±§ í µí±â1 ) 2
on the edge D-R, which is a source. The vertex D is a sink, while the vertex R still remains
The interior component forms a curve that connects the new
í µí¼ , 0� (í µí±2 â1) 2
two regions: one region where í µí±§Ì < 0 and the other where í µí±§Ì > 0. The last one includes the which starts in the state with í µí±§Ì < 0, has to travel to the region where í µí±§Ì > 0. Hence, all interior orbits converge to the vertex D.
14
vicinity of the edge C-D given by í µí±¥ = 0. Since í µí±Ì < 0 holds in int í µí±3, any interior orbit,
on the edge R-C, and divides int í µí±3 to
A.2. The Hamiltonian System
Divide the right-hand side of Eqs. (7) and (8) by the function í µí±(1 â í µí±)í µí±§(1 â í µí±§), which is í µí±Ì = í µí±§(1âí µí±§) =: âí µí±(í µí±§),
âí µí°¹(í µí±§)
positive for any (í µí±, í µí±§) in the interior of the unit square [0,1]2 . Hence, í µí±§Ì =
í µí¼âí µí±2 (í µí±2 â1)í µí¼ í µí¼(1âí µí¼)
introduce í µí°»(í µí±, í µí±§) â¶= í µí°º(í µí±§) + í µí°¿(í µí±), where í µí°º(í µí±§) and í µí°¿(í µí±) are primitives of í µí±(í µí±§) and This transformation corresponds to a change in velocity and does not affect orbit. Let us í µí±(í µí±), respectively:
í µí±
2 í µí°º(í µí±§) = í µí±2 �1 â í µí± ï¿½ log í µí±§ + í µí±2 (í µí±2 â 1) log(1 â í µí±§) + í µí±
(í µí±§),
=: âí µí±(í µí±).
í µí°¿(í µí±) = í µí¼ log í µí± + [í µí±2 (í µí±2 â 1) â í µí¼] log(1 â í µí±). í µí±Ì = â í µí¼í µí±§ ,
í µí¼í µí¼
with í µí±
(í µí±§) bounded on [0,1]. Thus, we obtain the Hamiltonian system Ì (í µí±, í µí±§Ì ) if í µí±2 (í µí±2 â 1) â í µí¼ > 0 and 1 < í µí±2 < í µí±, the interior equilibrium í µí± is a stable point Because the system is conservative and the Hamiltonian attains a strict global maximum at surrounded by closed orbits. Indeed, all interior orbits are closed: í µí°º(í µí±§) â ââ as í µí±§ â 0, 1 if 1 < í µí±2 < í µí± and í µí°¿(í µí±) â ââ as í µí± â 0, 1 if 0 < í µí¼ < í µí±2 (í µí±2 â 1) . Hence, í µí°» â ââ uniformly near the boundary of [0,1]2 and thus all constant level sets of í µí°» are closed return to their starting points. í µí±§Ì =
í µí¼í µí¼ í µí¼í µí¼
.
Ì curves around (í µí±, í µí±§Ì ). The solutions have to remain on the constant level sets and thus A.3. The second-order sanctioning freeloaders) will be reduced by 100í µí»¼% ( 0 â¤ í µí»¼ ⤠1 ), under the assumptions that We examine an extensive model in which rewards for cooperators (second-order í µí±2 (í µí±2 â 1) > í µí¼ and 1 < í µí±2 < í µí±. In the extension, the expected payoff for a cooperator is
1âí µí±§ í µí± í µí±¥ 2 í µí±C = (1 â í µí»¼)í µí±2 í µí±2 �1 â í µí±(1âí µí±§)� �1âí µí±§ï¿½,
given by
15
í µí±Ì = âí µí±(1 â í µí±)[í µí°¹(í µí±§) â í µí»¼(í µí±2 (í µí±2 â 1) + í µí°¹(í µí±§))í µí±], and, Eqs. (7) and (8) turn to í µí±í µí± = í µí»¼í µí¼+(1âí µí»¼)í µí±
í µí¼
The fact that í µí°¹(í µí±§) is monotonically decreasing and í µí°¹ï¿½í µí±§í µí± ï¿½ ⥠0 yields that 0 < í µí±§í µí± â¤ í µí±§Ì , . a threshold í µí»¼í µí± given by increasing α. This implies that as α increases, Q moves towards the edge R-C. As α crosses
í µí±2 (í µí±2 â1)+í µí°¹(0) í µí°¹(0)
In the interior of í µí±3 , there exists at most one fixed point í µí± = (í µí±í µí± , í µí±§í µí± ) such that
2 (í µí±2 â1)
í µí±§Ì = âí µí±§(1 â í µí±§)[í µí±2 (í µí±2 â 1)í µí± â í µí¼ â í µí»¼(í µí±2 (í µí±2 â 1) + í µí°¹(í µí±§))í µí±(1 â í µí±)]. and í µí°¹ï¿½í µí±§í µí± ï¿½ = í µí±2 (í µí±2 â 1)
1âí µí»¼í µí¼í µí± í µí»¼í µí¼í µí±
where í µí±§Ì is the unique solution of í µí°¹(í µí±§) = 0 . í µí±í µí± increases and í µí±§í µí± decreases, with , a new equilibrium with (í µí±, í µí±§) = ï¿½í µí»¼í µí±
í µí±âí µí±2 , 0� 2 (í µí±â1)
enters the edge R-C through the vertex R, which then turns into a sink. The boundary equilibrium, P, is a saddle point, unstable within the edge and stable to invasion of another threshold í µí»¼í µí± given by defectors. As α further increases, P moves towards the vertex C, and when α crosses
í µí¼+í µí°¹(0) í µí°¹(0)
orbits converge, if 0 < í µí»¼ < í µí»¼í µí± , to a heteroclinic cycle on the boundary of í µí±3 , and if edge. Preliminary numerical simulations imply that Q is a source for α > 0, and all interior í µí»¼í µí± < í µí»¼ ⤠1, to the vertex R.
source. For larger values of α, í µí±3 has no interior equilibrium but P still remains within the
, Q exits í µí±3 through P, which then turns into a
16
References [1] R.M. Dawes, Social dilemmas, Annu. Rev. Psychol. 31 (1980) 169â193. [2] P. Kollock, Social dilemmas: the anatomy of cooperation, Annu. Rev. Sociol. 24 (1998) 183â214. [3] J.O. Ledyard, Public goods: a survey of experimental research, in: J.H. Kagel, A.E. Roth (Eds.), The Handbook of Experimental Economics, Princeton University Press, Princeton, NJ, 1995, pp. 111â194. [4] Q. Trivers, The evolution of reciprocal altruism, Rev. Biol. 46 (1971) 35â57. [5] R. Axelrod, W.D. Hamilton, The evolution of cooperation, Science 211 (1981) 1390â 1396. [6] M.A. Nowak, K. Sigmund, Evolution of indirect reciprocity by image scoring, Nature 393 (1998) 573â577. [7] M. Milinski, D. Semmann, H.-J. Krambeck, Reputation helps to solve the 'tragedy of the commons', Nature 415 (2002) 424â426. [8] M.A. Nowak, R.M. May, Evolutionary games and spatial chaos, Nature 359 (1992) 826â829. [9] T. Killingback, M. Doebeli, N. Knowlton, Variable investment, the Continuous Prisoner's Dilemma, and the origin of cooperation, Proc. R. Soc. B. 266 (1999) 1723â 1728. [10] D.S. Wilson, E. Sober, Reintroducing group selection to the human behavioral sciences, Behav. Brain Sci. 17 (1994) 585â654. [11] A. Traulsen, M.A. Nowak, Evolution of cooperation by multilevel selection, Proc. Natl. Acad. Sci. U.S.A. 29 (2006) 10952â10955. [12] P. Oliver, Rewards and punishments as selective incentives for collective action: theoretical investigations, Am. J. Sociol. 85 (1980) 1356â1375. [13] R. Axelrod, An evolutionary approach to norms, Am. Polit. Sci. Rev. 80 (1986) 1095â 1111.
17
[14] T. Yamagishi, The provision of a sanctioning system as a public good, J. Pers. Soc. Psychol. 51 (1986) 110â116. [15] R. Boyd, P.J. Richerson, Punishment allows the evolution of cooperation (or anything else) in sizable groups, Ethol. Sociobiol. 13 (1992) 171â195. [16] K. Sigmund, C. Hauert, M.A. Nowak, Reward and punishment, Proc. Natl. Acad. Sci. U.S.A. 98 (2001) 10757â10762. [17] E. Fehr, S. Gächter, Altruistic punishment in humans, Nature 415 (2002) 137â140. [18] J. Andreoni, W.T. Harbaugh, L. Vesterlund, The carrot or the stick: rewards, punishments, and cooperation, Am. Econ. Rev. 93 (2003) 893â902. [19] A. Gardner, S.A. West, Cooperation and punishment, especially in humans, Am. Nat. 164 (2004) 753â764. [20] J. Fowler, Altruistic punishment and the origin of cooperation, Proc. Natl. Acad. Sci. U.S.A. 102, (2005) 7047â7049. [21] C. Hauert, A. Traulsen, H. Brandt, M.A. Nowak, K. Sigmund, Via freedom to coercion: the emergence of costly punishment, Science 316 (2007) 1905â1907. [22] M. Sefton, R. Shupp, J. Walker, The effects of rewards and sanctions in provision of public goods, Econ. Inquiry 45 (2007) 671â690. [23] K. Sigmund, Punish or Perish? Retaliation and collaboration among humans, Trends Ecol. Evol. 22 (2007) 593â600. [24] T. Kiyonari, P. Barclay, Cooperation in the social dilemma: free riding may be thwarted by second-order reward rather than by punishment, J. Pers. Soc. Psychol. 95 (2008) 826â842. [25] M. Shinada, T. Yamagishi, Bringing back Leviathan into social dilemmas, in: A. Biel, D. Eek, T. Gärling (Eds.), New Issues and Paradigms in Research on Social Dilemmas. Springler-Verlag, Berlin, Germany, 2008, pp. 93â123. [26] H. De Silva, C. Hauert, A. Traulsen, K. Sigmund, Freedom, enforcement, and the social dilemma of strong altruism, J. Evol. Econ. 20 (2009) 203â217. [27] M. Nakamaru, U. Dieckmann, Runaway selection for cooperation and strict-and-severe punishment, J. Theor. Biol. 257 (2009) 1â8.
18
[28] R. Boyd, H. Gintis, S. Bowles, Coordinated punishment of defectors sustains cooperation and can proliferate when rare, Science 328 (2010) 617â620. [29] P.A.I. Forsyth, C. Hauert, Public goods games with reward in finite populations, J. Math. Biol. (2010) Published Online First: 24 September 2010. Doi: 10.1007/s00285-010-0363-7 [30] C. Hauert, Replicator dynamics of reward & reputation in public goods games, J. Theor. Biol. 267 (2010) 22â28. [31] C. Hilbe, K. Sigmund, Incentives and opportunism: from the carrot to the stick, Proc. R. Soc. B. 277 (2010) 2427â2433. [32] K. Sigmund, H. De Silva, C. Hauert, A. Traulsen, Social learning promotes institutions for governing the commons, Nature 466 (2010) 861â863. [33] S. Mathew, R. Boyd, When does optional participation allow the evolution of cooperation?, Proc. R. Soc. B. 276 (2009) 1167â1174. [34] J.H. Orbell, R.M. Dawes, Social welfare, cooperatorâs advantage, and the option of not playing the game, Am. Sociol. Rev. 58 (1993) 787â800. [35] C.A. Aktipis, When to walk away and when to stay: cooperation evolves when agents can leave unproductive partners and groups, J. Theor. Biol. 231 (2004) 249â260. [36] T. Sasaki, T. Unemi, Probabilistic participation in public goods games. Proc. R. Soc. B. 274 (2007) 2639â2642. [37] C. Hauert, S. De Monte, J. Hofbauer, K. Sigmund, Volunteering as Red Queen mechanism for cooperation in public goods games, Science 296 (2002) 1129â1132. [38] C. Hauert, S. De Monte, J. Hofbauer, K. Sigmund, Replicator dynamics for optional public goods games, J. Theor. Biol. 218 (2002) 187â194. [39] D. Semmann, H.-J. Krambeck, M. Milinski, Volunteering leads to rock-paper-scissors dynamics in a public goods game, Nature 425 (2003) 390â393. [40] J. Hofbauer, K. Sigmund, Evolutionary Games and Population Dynamics. Cambridge Univ. Press, Cambridge, 1998. [41] K. Sigmund, M.A. Nowak, Evolutionary dynamics of biological games, Science 303 (2004) 793â798.
19
[42] J.A. Fletcher, M. Doebeli, A simple and general explanation for the evolution of altruism, Proc. R. Soc. B. 276 (2009) 13â19. [43] J.A. Fletcher, M. Zwick, The evolution of altruism: game theory in multilevel selection and inclusive fitness, J. Theor. Biol. 245 (2007) 26â36. [44] K. Sigmund, C. Hauert, A. Traulsen, H. De Silva, Social control and the social contract: the emergence of sanctioning systems for collective action, Dyn. Games. Appl. 1 (2011) 149â171.
20
Figure Captions Figure 1. Defectors (first-order freeloaders) prevail. Oscillations do not occur and the (b) In the boundary case that í µí±2 (í µí±2 â 1) â í µí¼ = 0, the edge D-R is a line of fixed points. interior state space has no fixed point. (a) All interior states evolve towards the vertex D. All interior orbits converge to a stable (lower) segment of the edge. Random drift and Parameters: í µí± = 5; í µí±1 = 3; í µí±2 = 1; í µí±2 = 1.2 (a) or 1.4 (b); í µí¼ = 0.4; and í µí±1 = 1 (in occasional invasion of the missing C-player will eventually send the state to the vertex D. the case of WA), í µí±1 = 0.4 (in the case of SA).
Figure 2. Rewarders prevail. Oscillations do not occur and all interior states evolve í µí±1 = 3; í µí±2 = 1; í µí±2 = 5.5; í µí¼ = 0.4; and í µí±1 = 1 (in the case of WA), í µí±1 = 0.4 (in the
towards the vertex R. The interior state space has no fixed point. Parameters: í µí± = 5;
case of SA).
and the boundary of í µí±3 represents a heteroclinic cycle. The interior of í µí±3 has a unique Figure 4. The effects of second-order sanctions. (a) The existing interior fixed point í µí± í µí±2 = 1; í µí±2 = 3; í µí¼ = 0.4; and í µí±1 = 1 (in the case of WA), í µí±1 = 0.4 (in the case of SA). fixed point í µí±, which is a center surrounded by closed orbits. Parameters: í µí± = 5; í µí±1 = 3;
Figure 3. Rock-scissors-paper cycles. All three corners of the simplex í µí±3 are saddle points
turns into a repeller by cutting off í µí±% rewards for cooperators. The population converges if there is an invasion of defectors. Parameters: í µí± = 5; í µí±1 = 3; í µí±2 = 1; í µí±2 = 3; í µí¼ = 0.4;
to a heteroclinic cycle on the boundary of í µí±3 . (b) For a sufficiently high í µí±, the vertex R can be a global attractor. At the same time, í µí±3 has a boundary fixed point í µí±, which divides the basins of attraction of rewarders and cooperators on the edge R-C and is stable and í µí±1 = 1 (in the case of WA), í µí±1 = 0.4 (in the case of SA). The rewards are cut by the
following percentages (a) í µí± = 10 and (b) í µí± = 20.
21
Figure 1
a
R
b
R
D
C
D
C
Figure 2
R
D
C
Figure 3
R
Q D C
Figure 4
a
R
b
R P
Q D C D
Q C