Thursday, January 23, 2020

Contributors Team, Redux

Last summer, in a burst of sunny optimism and out of a desire for transparency, I posted a blog post about the then-new PostgreSQL Contributors Team, which was charged with updating the contributors page as required. Having now been on this mailing list for 7 months or so, I have a few - actually quite a few - comments about the whole problem space.

Recognizing contributors is important. People contribute to the PostgreSQL project for many reasons. Some have an economic interest in contributing to PostgreSQL and in being seen to contribute to PostgreSQL, while for others it is a hobby. Regardless, everybody likes to be recognized and thanked for their work. When people feel that the work they do is not appreciated, they are less likely to continue contributing. I myself have, at times, stopped doing things for PostgreSQL which I thought were valuable because other people either disagreed or did not really seem to care. Maybe that doesn't reflect well on me, but I think I'm probably not alone.

Deciding how to recognize contributors, and who to recognize, and in what way, is a lot harder. Many people, including me, appreciate the recognition that comes with having our name mentioned in a commit message as a reviewer or co-author. Those recognitions tend to feed into the release notes, another form of recognition which I think many people value. Code and review contributions are to some extent quantifiable. We have all the commits and can extract statistics from them, and I have been doing some of that annually for the last few years. If the commit messages had a standard way of crediting authors, co-authors, and reviewers, we could automate more of that work and have better statistics, which I think would be good. Broadly, data from the CommitFest application seems to be useless, because many people review but don't add their names there, and a fair number of people do the reverse. I think committers generally take care to make the list of reviewers in the commit message as accurate as they can, but that doesn't correspond very well to the theoretical list in the CommitFest application.

People who make a particularly large number of the kinds of contributions that get credited in commit messages tend to (eventually) get listed as a "Contributor" or "Major Contributor," but it's not very clear what amount of such code contributions ought to entitle someone to such a listing. There is no standard agreed even among the four people on the contributors team as to what should qualify, and at least some of the people who have emailed the list feel that the standard should be much lower (which, of course, would result in a much longer list of contributors). Such a view isn't unreasonable: after all, someone who has contributed even 1 line of code to PostgreSQL at any point has made a contribution, and is thus a contributor of some kind. On the other hand, if the requirements for listing were reduced so that the list got ten times longer, it would be (a) more work to maintain and (b) less meaningful to be a part of it. I don't think it has ever been anyone's intention to list everyone who is contributed in any way, nor would it be appropriate to acknowledge only people who have made truly massive contributions, but it's not clear where in between those extremes the line ought to be drawn.

The problem grows even more acute when we look at non-code contributions, which are harder to measure. Some people work on the sysadmin team, or the code of conduct committee, or the web team, and the actual amount of work those people do likely varies from almost none to a whole lot. We have no good way of knowing how much work any given person did, nor have we any standard for how much work would justifying listing somebody. Likewise, many people help to organize PostgreSQL-related events, or advocate in PostgreSQL in other ways, and many people also work on projects that are part of the PostgreSQL ecosystem, such as drivers, connection poolers, management tools, etc. It is not clear to what extent these ought to be acknowledged as contributions to PostgreSQL and to what extent they ought to be regarded as contributions to something else. For instance, Dan Langille always acknowledges the people who volunteer at PGCon in his slides at PGCon. I would encourage other people to do likewise at the events they organize, but at least some such people - or the conference organizers themselves - would also like to be acknowledged on the PostgreSQL web site itself. Similarly for code: if someone works on PostGIS or pgpool-II or DBD-Pg or node-pg, is that a contribution to that project, or is that a contribution to PostgreSQL? Maybe both.

But this magnifies the problem of who should be listed to such an extent that it's hard to even get your head around it. Now, you have to try to quantify contributions not only to the PostgreSQL code base but to every other code base, not only to the PostgreSQL infrastructure but to every other project's infrastructure, not only to organizing events about PostgreSQL but to organizing events that relate to every PostgreSQL ecosystem project. If you picked a random hacker at PGCon, they could probably list a bunch of PostgreSQL ecosystem projects, but there would probably be many ecosystem projects of which that particular hacker was totally unaware. If a given PostgreSQL driver, say, is totally unknown to anybody on the contributors team, or likewise for a given PostgreSQL event, it might still be really important part of PostgreSQL for certain people: those who are interested in Prolog, those who live in a country far-removed from the one in which I live, those who use PostgreSQL to do genome sequencing, or whatever. So you first have the problem of knowing whether to qualify something as related to PostgreSQL, and then you have the second problem of measuring the magnitude of someone's contribution to that thing that you know little about, and then you have the third problem of deciding whether is "enough," which is not even agreed among the 4 people on the team let alone everyone else. Perhaps I shall be accused of giving up too easily, but I think that this is totally intractable.

There is even another level of this, which is that there are people whose contribution to PostgreSQL is to manage engineers who work on it, or work with it. Such people are in many cases totally unknown to anyone in the PostgreSQL community outside their own company, but that is not to say that PostgreSQL isn't better off for their efforts. Much of my own work on PostgreSQL over the last ten years would not have been possible but for the support of my management at EnterpriseDB. I would not personally propose to list those people as contributors on that basis, but people have made essentially similar proposals regarding people who work for other companies. Perhaps some of those people are better known to the PostgreSQL community than my own boss, but not all of them are; moreover, it seems fundamentally unfair to me to make personal acquaintance more a part of who gets recognized than it is intrinsically. Still, from another point of view, who is to say that the person who secured a million dollars worth of funding for a project is less worthy of recognition than the people who did the work? I am not enough of a technocrat to advance such a position.

There is the further problem of deciding what to do with people who used to make large contributions and still contribute occasionally. If someone has altogether stopped contributing, they can be moved to the "Past Contributors" section and probably no one will mind. But, if someone still contributes as a lesser level, should we move them from "Major Contributor" to "Contributor," thus obscuring the fact that they were once listed as a "Major Contributor"? Or do you remain a major contributor forever unless you disappear completely? Perhaps instead of a single "Past Contributors" section there should be a section for "Past Major Contributors" which is separate from "Past Contributors." Then, perhaps, people could be moved to the "Past Major Contributors" section even if their current level of contribution is non-zero, to make it clear that they did at some point do great things.

This gets at yet another problem, which is that you can imagine a system which does a better job classifying people by the type of contribution that they have made. We have two ways of doing that at present. First, current and former core team members are listed separately from contributors of other types. This feels like an anachronism. I think at some point the core team consisted of the main developers, but that's not true any more. The core team isn't primarily about development but about governance, and the people on the core team aren't necessarily the ones doing the most work for the project, though many of them do a whole lot. Personally, I think I would be inclined to abolish this distinction. Second, major contributors have a blurb (which is within their own power to edit) that describes their contributions. Contributors who are not major contributors do not get a blurb, nor do former contributors. Leaving aside the question of what sorts of code or non-code contributors qualify for someone for addition to the list, we also have no way of indicating who did what. This is problematic not only because it can make it unclear why certain people are on the list and others are not, but also because the contributors team itself has no way of keeping track of it, unless we maintain a separate spreadsheet or something. I think it might be useful to tag people in some way that indicates that they are members of certain teams or have certain roles in the project (code of conduct, core, security, infrastructure, web) or that otherwise indicates what we think they contributed (conference organizing, advocacy, contributions to PostGIS, or whatever). This wouldn't solve the problem that two people who are similarly tagged might have done very different amounts of work, but at least we might have some idea who is theoretically doing what kind of work, which would be more transparent than what we have now. But, any changes of the type mentioned in this paragraph would require not only consensus, which could be a challenge, but also code changes, which is another obstacle.

I'm not sure that this captures all the problems with the current system that have been pointed out to me, or that I have noticed myself, but I hope it is enough to a feeling for some of the difficulties. It is not my purpose here either to say that the task is hopeless or, on the other hand, to propose solutions to any of the problems I've just mentioned. It is also not a defense of myself or anyone else, or on the other hand an attack. I am not trying to say that I've done the best that could be expected under the circumstances, or that my opinions are more correct than anyone else's, or that anyone else has done particularly well or poorly in dealing with these issues. Rather, it is my goal to raise awareness. If that results in some broader discussion of these issues within the PostgreSQL community, I think that would be all to the good; if it results in some consensus on how to improve things, even better.


  1. I like the tagging idea. Oracle changed their ACE program at one point which resulted in a bunch of people no longer qualifying; I've seen a number refer to themselves as "Oracle ACE Alumnus" ... maybe every PostgreSQL-related tag (for being a member of some team) could have a corresponding "alumnus" or "emeritus" tag as well. For that matter you could even have tags for working at significant PostgreSQL companies past and present.

    Another idea would be to shift the focus from providing a "list of contributors" to simply providing a "directory" of individuals who have an interest in the PostgreSQL ecosystem. Could perhaps even let everyone write their own twitter-size bio - and allow links to their personal LinkedIn, Twitter, Github, etc.

    Determining "major" vs "minor" contributions still seems tricky though, my spontaneous brainstorming has yielded no good ideas about this yet. :)

  2. Maybe it's possible to avoid the 'major/minor' tag and use lines of code/activity bands: "100 lines contributed","10K lines contributed", etc.

    1. If I'm not mistaken but the contribution to PostgreSQL project is not only counted by the number of lines coded.

  3. Robert, thank you for the good post. It covers most of my concerns about contribution counting.
    I'd like to notice one thing, which I find important as well. This is the description on our contributors page. Today it says: "These are the fine people that make PostgreSQL what it is today!". Imagine you're one of people, who contributes PostgreSQL in some way, but not currently listed on the page. Then you might think you're either not fine or not making PostgreSQL what it's today, which is kind of discouraging. I think it would be way better if you explicitly state what contributions do we currently recognize. For instance, it might say "This is the list of people, who did significant core code, infrastructure, website, ... contribution within last 3 years". That would be more clear and less discouraging for not listed people.

    1. +1
      If there is contributors list, everyone who is not on list is not contributor.