I feel I should weigh in here even though I should be working. I agree pretty much completely with Aeolus's first post so I'm not going to repeat anything he said, and I will completely defend his proposed banlist thread even if we didn't discuss the minute details (because why should we have to, it's a proposal and we have talked about starting with a banlist months ago).
The main reason I defend the posting of Aeolus's "Proposed Starting Banlist"thread is because, lest we forget, this is a discussion forum of potential pokemon policies. If there is one thing that we both, as the obvious and long-term leaders of the Gen IV Tiering efforts, have striven for throughout, it is collaborative engagement. Collaboration with each other, collaboration with Tiering Contributors both would-be and appointed, collaboration with our entire community. A proposition is a fine springboard for discussion, and an especially welcome gesture from a relatively "all-powerful" facilitator of competitive pokémon.
The power level defining ubers was never serious reconsidered. We have a duty as competitive players to explore that power level properly, especially in the face of a new game. I have seen a lot of posts by people so confidently stating that not much will change. This couldn't be more wrong. The truth is you have no idea what subtle changes to move pools, move stats (e.g. power, PP), and new Pokémon will have on the relative quality of Pokémon. It doesn't take much to shift the game significantly, and deciding a ban list in advance will effectively blind you to it.
To be clear, I am first and foremost pleased you made this thread. Its spirit is the one thing I value above all else with regard to competitive pokemon—actual proactivity towards and passion for the game. This is a quality that has been sorely lacking in our community for years and is still a very, very big problem that sadly can only be addressed by people who care.
This mistake, made early in the history of DP, laid the foundation for all of the tiering debates to come. It is a mistake that should have been avoided. Only banning broken Pokémon, after plenty of play experience, would have been years shorter than the process that actually ensued, and not tainted by doubts of illegitimacy.
By November 2007, Shoddy Battle 1 had ladder functionality. The Smogon arbitrary ban list had not changed in that time. Unfortunately, that said arbitrary ban list was already well ingrained, and any major change to it was impossible. Independent of Smogon, we (me, AA, obi, tenchi, and others) adopted a very minor testing scheme, involving a tournament to test Deoxys-S. One thing we learned from the tournament is that Swiss tournaments are too complex for most players in this community, at least without software support. More significantly, not a single person of the hundreds of people who had played in the tournament voiced a problem with Deoxys-S being unbanned.
Cathy, if you felt so strongly about starting Gen IV completely play with no bans, there was very literally nothing stopping you from adopting a not-so-minor testing scheme with no bans on your Official Server. I genuinely wish you had done this and ask you completely without rhetoric why you did not, especially given your willingness to include the controversial Wobbuffet and Deoxys-S in a metagame you had literal control over.
Two weeks after the conclusion of the tournament, some notable Smogon members who were up to that point uninvolved with official server, were so excited by our unbanning of Deoxys-S that they asked if I could unban Wobbuffet immediately, without the benefit of another tournament. It turned out that Wobbuffet was the next item on our list anyway, but we mulled over whether another tournament was worth it. Ultimately, in light of the fact that the previous tournament had failed to convince anybody of anything, we decided to forge ahead and unban Wobbuffet. The backlash was intense. No one wanted to test Wobbuffet. In public, I defended our move, but in private, I was quite upset with AA. I had put in hundreds of hours of work writing a Pokémon simulator, which was extremely popular, and was the basis of competitive Pokémon on the internet at that point, and everybody hated me for some minor tier experimentation that wasn't even my idea. This was extremely grating.
I was so uspet by the backlash that I attempted to devise a statistical argument for banning Wobbuffet. Unfortunately, it couldn't be done. Barely anybody even used Wobbuffet on the ladder. You could play the game as though Wobbuffet did not exist, and you would only lose the occasional match. In effect, this was not a broken Pokémon, because it didn't affect how you constructed your team at all, as far as ladder play was concerned. This never changed for the entirety of Official Server.
The lesson learned here is that popular opinion cannot be ignored in tiering decisions. Strong feelings that a Pokémon is broken prevent it from being tested. In fact, the hatred for this Pokémon was so intense that any vote to ban it would have easily been by a 2/3 supermajority, and probably much more.
This reflection seems to forget two key things (insofar as the reflection itself if rather pointless if it was actually made with cognizance of the keys I am about to recite). First, I hope you realize that appealing to emotions when recounting how "grating" your experience was or how "upset" you were will not go very far here and has no relevance as far as what "we" learned. The Suspect Test Process was not a walk in the park for either Aeolus or myself, but you will hardly read public accounts of my disappointment and likely not find any of Aeolus.
Second, you touch upon an important lesson we all hopefully did learn through Wobbuffet. But, again, what was stopping you from posting your proposal of a supermajority vote then, in March 2008? It likely wouldn't have been opposed according to your suspicions. Why did you stay silent? Why did most everyone else involved in competitive pokemon? Why did I have to be the one to
make the effort to actually get Wobbuffet banned from your server when it became obvious that statistics weren't going to cut it? As I said after two months of effort in that thread:
Jumpman16 said:
In fact, the only concrete things we have to go on, again, are the aforementioned #39 -> #43 -> #43 -> #46 in weighted usage on the ladder and the fact that the argument that Shed Shell and U-turn usage has not gone up as was expected by many of the people against Wobby. Again, I'm not speaking for Colin, but even if he doesn't seem interested in analyzing something other than numbers, "that's why I made this thread".
Collaborative engagement was the only thing that was going to work to get Wobbuffet banned. And collaborative engagement is one of the only ideals that actually cannot be faulted about the entire Suspect Test Process of Gen IV.
Cathy said:
Unfortunately, things went very far downhill shortly after this. The next year was spent on entirely pointless "tests" because by its very design, so-called "Stage 2" was 100% pointless. Eventually, when Stage 3 rolled around, the results of Stage 2 were irrelevant.
Jabba already defended the process, which, again, was his and agreed upon by a..."supermajority" of other PR posters as a result of the collaborative engagement created when I made my Order of Operations thread. But for the record, my intention was for the Stage 2 "tag" to have a quantitative weight on whatever the Stage 3 results would be. I've already made clear elsewhere that the point of the Suspect Test was this, as voiced in the beginning of this year in my Smog article:
My goal has always been to include the community in the process of making and maintaining our competitive tiers, even though it would have been much faster to simply poll the opinions of a few of our tenured, well-respected and battle-tested members instead.
I don't care any more about your assessment of the merits of this goal any more than you care to ask (this goes for everyone, not just Cathy). The aim was clear, and, most importantly, underlines the collaborative engagement I've referenced quite a few times already.
Let's take a step back and think about the previous paragraph. A whole year was wasted by a process that was designed out of the box to be pointless. I want to make sure that is very clearly understood. Stage 2 was pointless. This is so important to understand because it is often bandied about that proper tiering processes take too long. In reality, poor decisions regarding the tiering process is what makes it take too long. Unfortunately, DP was a case of the latter. A sane process would have been similar to stage 3 from the start. Also important is that a sane process would have stopped at the design of the first test, and considered only a simple rating and deviation check.
Another way in which things went far downhill was the introduction of two extra metrics to filter the voter pool. First, voters had to submit "paragraphs" which were never published for public inspection, and which were arbitrarily used to decide who would vote. This measure alone ruined the system. Particularly ironic is the fact that in ruining the system, it was also made slower, and one big complaint is always how slow things are; this was the fault of the people making this complaint.
The second big mistake that was made around this time was the introduction of "suspect experience". This is a secret measure that no one except for three people know the definition of. We were told repeatedly that it was good, and useful, but of course, since we couldn't see it, we had no idea. At this point, the process was devastated. Voters were excluded on completely mysterious grounds, both through paragraph submissions and a top secret formula that was a terrible idea, and remains a terrible idea.
All three of these ideas (Stage 2, voting paragraphs, SEXP) were proposed and discussed both openly and repeatedly in the spirit of collaborative engagement, as per the norm. Only regarding SEXP would you have any argument, which you have already posted (and has already been addressed by me, Doug and X-Act countless times). The three of us, including Aeolus, do not feel that is a "terrible idea", and since we are the only ones who know what it is, this is very simply a matter of whether you trust us, as Doug posted last year. Just because you or others may not doesn't mean that SEXP or the "mysterious grounds" upon which Aeolus and I rejected paragraphs (or accepted them, which you should have taken equal issue with if you wanted your concerns to seem impartial) doesn't mean they were a bad idea, and you are simply going to have to deal with that.
The next substantative thing to happen wasn't until August 2009, when so-called "Stage 3" started. This represented a process similar to what the process should have been from the start. Particularly jarring was the way it had been designed to make the previous year's work useless. The flaw here was wasting the previous year; Stage 3 should have been the entire process. Stage 3 was still a mess though. My attempts to improve it slightly ended up wasting many dozens of hours of my time, and ultimately led to nothing, despite the large number of people who supported something along the lines I was proposing.
Again, I hope you don't think that referencing the waste of your singular time is going to move many people. Unless you literally don't believe what you are telling us "we learned" from this process, I am pretty sure there are others in this community who have, by your own pessimistic and results-oriented calculations, wasted hundreds more hours than you on it, so a further appeal to emotion is either contradictory to your actual thoughts or dwarfed by the largely unvoiced disappointments of more involved parties.
Besides, I'm not even sure what you mean by stating your attempts "ultimately led to nothing". Didn't you get the supermajority observation that you wanted when you finally decided to speak up?
After Stage 3, things became even worse. After messing up immensely over the last year and a half, the wasted time was used as a reason to introduce an another bad process. First of all, after messing up so badly, there should have been a major leadership change in tiering policy. How does it make any sense that after messing up badly you get a second chance? We have plenty of people far more capable of handling tiering than the people who handled it this generation. We need people with special skills. People who not only enter tournaments, but place well in them. People who engage with strategy and the community in Stark Mountain. People who have contributed to site content more recently than two or three years ago. People who are capable of putting in the technical work required to make processes a reality. It's time for other capable members of the community to set the direction for tiering policy.
Okay, let's step back for a second and look at this with the objectivity you called for in your opening. Why *were* Aeolus and I allowed to stay in our leadership position of Tiering Contributors? Please ask yourself this honestly and, again, without any rhetoric, because I intend none whatsoever. Why do you think that we were "permitted" to continue leading the Tiering directive?
A good guess would be that we had perhaps built up enough trust in the community after our repeated efforts at collaborative engagement that the very community we explicitly sought to respect with the spirit of the Suspect Test Process still trusted us to see our process through. I wouldn't make this guess, though...instead, I would simply repeat the not-so-rhetorical questions I posed to 5KR when he asked the very same questions you are now:
Jumpman16 said:
what if gouki, an upper requirement voter for this test and generally respected member of our community, tallied the votes on latios? after all, gouki had no problem reaching the upper limit, and, more importantly, had an extremely high Suspect EXP ranking.
he voted uber. under your assumption that experience in a given suspect test metagame is required to be able to determine "adequate suspect usage and sound voting reasoning", gouki is much, much more qualified than i to tally submissions.
if he had, how do you think he would perceived the 50 or so submissions? do you think he would be more inclined to find issue with the submissions that state why the would-be voters feel that latios is OU, or less inclined? no inclination? why?
and of those submission that leaned uber...would he be more inclined to agree with these than those that did not? less? no inclination? why?
do you know what i think about latios?
do you know why you don't know what I think about latios?
Now this thread of yours may very well succeed in "ousting" Aeolus and I as Tiering Contributors. I don't really care if it does or doesn't. But you must realize that if anyone besides us actually wanted to step in and lead this process, they wouldn't need the urging of an admin dissatisfied with the process to do so years after the fact (as if there's only one such admin to begin with).
I honestly and respectfully think that your rallying cry is extremely pathetic. Not because of its merit (and not "you", don't read this comment as a personal attack, if I wanted to call you pathetic I would), but because I know and Aeolus knows that the mere fact that you even had to *post* it should underline what we personally are disappointed in most about the community: people here rarely step up to lead even if they are "obviously" more qualified for the position. It underlines my meaning in the beginning of this post where I said I was pleased that you had posted—I am glad it has now been made apparent by one of the "few people"—by the faulty perception of the rest of the community—worth listening to. You have highlighted the source of my outright discouragement with the would-be all star battlers who fit the description of "leader" you so plainly outline in your missive better than I could have, in the sense that such a lament wouldn't do much good coming from me myself (which is why I've never posted it).
I am actually now compelled to break down your job description to see how many people here are qualified to lead us into Gen V.
We need people with special skills.
Great! We have a lot of users like this, even if the description is kind of vague for a leadership position.
People who not only enter tournaments, but place well in them.[/i]
We have a ton of these people too, and easy to point out as well. A promising start! Let's keep going.
People who engage with strategy and the community in Stark Mountain.
Uh oh. That number has greatly diminished already! It is a shame that not too many people are willing to post in Stark Mountain or Policy Review (posting in the latter being a function of excelling in the former forum). We could name names of people who have passed on the first three of your requirements, but I fear that this list is already rather short.
People who have contributed to site content more recently than two or three years ago.
"Contribution" to the site is as vague as the several threads in Inside Scoop pertaining to the word's meaning with respect to badges would suggest, but this doesn't necessarily weed out anyone who has passed your first four requirements.
People who are capable of putting in the technical work required to make processes a reality.
Hmmm, "capable". Another eager quality, no doubt, but does this necessarily capture who we're looking for? And to what do you refer when you say "technical work"?
It's time for other capable members of the community to set the direction for tiering policy.
Maybe it is, maybe it isn't. I fear that your aims have been a little optimistic though, because I'm not sure anyone here fits this description.
Ultimately, this all goes back to when I "had to" post that Wobbuffet thread and before. You left off "willing" in your list of ideal requirements, and it is the single most important characteristic anyone can have in this volunteer vocation that is Smogon. We already know who the willing people are, regardless of what other qualities they may have, because they post here, and post here consistently. Lack of willingness is a problem at Smogon. It's not changing, no matter how long your post is or how right you may be. If you haven't realized that by now, Cathy, you are going to remain frustrated for as long as you are here.
Cathy said:
The Smogon Council was a very bad idea. When it was first mentioned it in #stark, I said in a private message that it was not even worth the time to argue with it, because no one would swallow it. Obviously, I was wrong. Smogon's culture of respect (people with status must be respected unconditionally) has prevented people from pointing out the obvious: that the smogon Council was the worst idea since suspect experience. The Council was not even faster than a simple vote based on a simple rating/deviation metric. The Council consists of people handpicked by two people in a process based on nothing tangible and with no oversight. It's effectively no different from those two people banning pokemon by fiat. It may be better than the previous process, but that's a low bar.
Your underestimation of the totalitarian sway of the mighty Jumpman16 and Aeolus does not preclude your own absolute responsibility to have spoken your mind. As a user who has just successfully impacted a change in the tiering process with the suggestion of the utilization of a supermajority. As an administrator of this site and this community. As an intelligent, well-spoken user whose input is very widely respected here. As a user of tenure regardless, whose input would be much more likely to have an impact regardless of anything as. And, most importantly, as a user who genuinely profess to care or have cared about the tiering process.
In light of all this, I would argue that it is more disappointing that even a user of your influence could be likened to the rest of a community that remains silent when needed most. It is disappointing at its core, and regardless of any personal affront you may feel this post of mine is, I want you to read that again. It is utterly disappointing that even you, Cathy, were silent when the community needed "proper" input the most. There is no getting around that. Are Aeolus and I honestly that respected or feared that we hardly have any opposition even from those literally most likely to topple our misguided regime? If this is true, then there actually may be some merit to appointing some new but not-too-respected leaders for Gen V's tiering, someone whose "stature" will not get in the way of the necessary voicing of concerns and opinions regarding the process itself. I'm being completely seriously, by the way. If there is going to be a serious appeal to the "problem" of how respect leaders are as it relates to voicing necessary concerns, then we have a very, very big problem on our hands.
That brings us to today. Everybody knows the first process was a disaster. After all, the flaws with that first process are continually cited as the reason to introduce the council. This alone should raise eyebrows about the same people who designed that previous process having continuing influence on Pokémon policy. Although they don't realise it yet, they also messed up a second time with the "Smogon Council". Twice is more than enough chances. You may not agree with my personal position of not banning Pokémon before the game is released, but if there is one thing you should take away from the history of tiering in DP, it's that some new qualified people need to step up to the plate to spearhead tiering in the next generation. We should avoid banning things hastily. We have plenty of time to do it right. So long as we avoid developing a process as bad as paragraph submissions, top secret formulas, and other arbitrary delays and exclusions, we don't run the risk of wasting years this time. Such a working process is a simple vote with the only filter being a ladder statistic check.
I thought that Aeolus and I had already agreed months ago that the Smogon Council was largely a failure, but maybe somehow you would know better than we. I could make the argument that we have devised most of the Gen V tiering process after having learned from Gen IV better than anyone else and more comprehensively than anything your post may have said, but you probably wouldn't want to hear it.
The bottom line is that there is no justification for starting off the next generation with arbitrary bans. The DP ban list is already very long, and the next generation is only going to introduce more pokemon of a similar level of power, or revise older pokemon up to that level. Even the argument about saving time doesn't hold water, because, using a good process, we can balance the game far faster than was done this generation. The best process is a simple vote based on a completely open metric. This is efficient, fair, representative, and completely peer reviewable. Most importantly, we should not ban any pokemon without having played the game for a while.
I think you could have kept your post to this paragraph and had it be just as effective if not more. As you can see, there is discussion now about whether or not to begin Gen V with a banlist, which is what we should be talking about. Your "completely open metric" can and should be posted assuming you haven't already (we wouldn't want echoes of "mysterious" SEXP in Gen V).
In conclusion, I was only as compelled to respond to your long list of grievances as you were to post them in the first place. You can respond if you want to, I don't care too much either way. If you honestly and truly feel that Aeolus and I are not fit to lead the tiering directive in Gen V, that's fine. In this event, it would make sense for both of us to share what we learned during this process that we haven't shared (and that your post still has not addressed, though how could it coming from a third party), because we both necessarily know what we could have done differently better than anyone else. I'm not sure you are interested in this feedback though, and if I'm right, then as Aeolus said, the same things will happen in Gen V that happened in Gen IV. Hindsight is rather easy to have on pilot projects, but if you didn't do the steering yourself, then your next flight may be every bit as turbulent.