The decision to base UU off of 1760 stats

Antar · Mar 13, 2014

Unlike in many of my other threads, I'm going to be fairly lax in "pruning" replies to this post--as long as your post is vaguely on topic and doesn't violate any of the rules of the site, I probably won't delete it and will most likely try to respond. What I'm saying is if you just want to make a post that says, "this is a stupid decision, and you're all elitist pigs," go for it. But I hope you'll at least read through this post first and give your responses some thought.

So I realize I'm a little late in making this post, but better late than never, right?

As many of you are already aware, UU is soon to be official, and it was decided that the initial UU banlist will be based not off of "standard" stats but off of "1760" stats.

The purpose of this post is to clarify that decision, what it means, and what it means for the future of usage stats and tiering.

Background

Just over a year ago, we decided to move away from using unweighted stats for tiering and instead assign weights to players' teams based on the player's rating. For the full details, read here, but here's the tl;dr version:

Tiers are first and foremost threatlists: the Pokemon that are classified as OU are all Pokemon that your team *needs* to be able to deal with in order to succeed. It doesn't matter as much if your team gets demolished by, say, Shedinja, if Shedinja only appears in one out of every 200 matches*.
Building on that reasoning, if that 1 Shedinja in 200 is used solely by players who don't know how to use it, who give it defense investment* and run Giga Impact,** then your team with the huge Shedinja weakness is probably fine even then.
THUS, the weighting we assign to a player's team is based on the probability--given the player's Glicko rating--that the player is "above average," that is, more skilled than the "average" player, who by definition has a Glicko rating of 1500.

The Problem

This definition worked fine for a time, and it was definitely an improvement over our previous, non-weighted stats system. But the problem was that Pokemon Showdown's popularity continued to grow, and while we are certainly seeing more competitive players battling on Showdown than ever before, the fact of the matter is that the overwhelming majority of players on Pokemon Showdown are not competitive players. This sounds elitist to say, but let me clarify: I'm not saying the problem is that players are using ineffective teams, where EV spreads are sub-optimal or where the players are using subpar sets or moves because they don't know any better--I'm saying the problem is players who have no desire to be competitive, who run monotype teams, Metronome teams, Anime-based teams (Blastoise's top teammate is Snorlax, with Pikachu at #4*) and teams with Hyper Beam variants (4%--weighted--of Sylveon run Hyper Beam, as do 5% of Heliosk and 14% of Porygon-Z*). These players are not interested in being "the very best"--they're merely interested in having fun. And that's fine for them, but they should not be influencing our tiers, because a player who's in it to win will almost always beat a player who's just playing for kicks. So it doesn't matter if your team can't handle a good Blastoise if the only Blastoise you're actually seeing runs Hydro Cannon**).

Raising the Cutoff to 1760

To combat this problem, we are entertaining several solutions. My favorite is that we establish an "unrated" OU ladder on Showdown where players could play if they're not interested in laddering. But this does nothing for us in the short-term. In the short-term, the best and easiest solution is simply to raise the cutoff to the level we believe corresponds to the strength of the "average competitive" player.

Unfortunately, this is strictly a judgement call. Moving forward, I'll be producing some metrics that I'm calling "candles of known brightness," basically a series of indicators we can look at to objectively assess at which points gimmicks and non-competitive teams "fall away." In Little Cup, an example would be the use of Leftovers, Sitrus Berry and Assault Vest, which never have any competitive use in the metagame. These will be things that are completely unambiguous. Donphan being used when there are better options? Not a candle. Donphan carrying Giga Impact?** Yes.

In the short-term, however, we're just doing it by feel, and what felt right this month was 1760.

This means the cutoff won't always be 1760. Some months it might be lower, some higher. Nor will the cutoff be the same for all metagames (I suspect OU will have the highest cutoff by far). And the decision of where to put the cutoff won't be decided by me, but rather by the leaders or councils of each metagame. It'll work like this: each month, I'll provide the councils with a series of stats as well as metrics for "candle intensity," and they'll use those figures to decide where the cutoff will lie that month. Note that I will *not* be providing tiering councils with the list of changes to the tiers that would result from their decision--the justification for choosing a cutoff should never be "I want X out of UU but want UU to keep Y, so I'm choosing this number." The rationale has to be that the cutoff was chosen because these are the stats that best reflect the state of the competitive metagame.

An Aside: Monthly Usage Stats

As many of you have noticed and commented on, the usage stats threads have gotten a bit unmanageable--there are too many tiers and too many analysis types to fit neatly in one thread, and this problem is only going to get worse if I start generating stats at three or four cutoff levels each month. So starting with March's stats, instead of making a stats thread, I'll be putting all the stats at all the levels on a web server (which I actually do already) and then just linking to the web folder by way's of an announcement.

Individual metagames can decide if they want dedicated threads for their tier posted in their subforums for the purpose of discussion, at which point the decision can be made about which cutoff(s) to post.

What Does This Mean For You?

The bottom line is that by raising the cutoff from 1500, the OU-UU list will better reflect the competitive OU metagame, and that's better for all involved: better for the OU player who's trying to decide who needs countering and better for the UU player, whose banlist better removes major threats. It means that if you're a competitive player, your contribution to our tiers will likely increase, and if you're not, then, well, you're probably not reading this thread.

I'd like to close with some sample calculations to give you an idea of how individual battles influence the usage stats.

Note that there were 2,549,546 OU battles last month on the ladder. (That's a lot.) Using a cutoff of 1500, the "average weight" was 0.559. With a cutoff of 1760, that number drops to
0.016. These numbers mean that the sum of all weights was 1,420,000 for 1500 and 40,800 for 1760.

Thus:

The player at the top of the OU ladder right now has a rating of 1933±28, which translates to a weight of 1.0 for both 1500 and 1760 stats, meaning each time that player battles, his or her team contributes roughly 0.00007% to the 1500 stats and 0.002% to the 1760 stats.
Compared to most "competitive" teams, my OU team is subpar (I'm working on it**). My current rating is 1711±45*. My weighting for 1500 stats is also 1, while my weighting for 1760 is 0.138. This means that one of my battles contributes roughly 0.00007% to the 1500 stats and 0.0003% to the 1760 stats.
A new player just starting out has a rating of 1500±130. That player's weight is 0.5 under 1500 and 0.0228 for 1760. One battle by such a player contributes roughly 0.00004% to the 1500 stats and roughly 0.00005% to the 1760 stats.

Here's the take-away: unless you're at the very top or the bottom half of the ladder, your contribution to the stats won't change much (if anything it'll rise slightly). But if you're a bad player, your contribution will be removed, and if you're one of the very best players, your contribution will mean a lot, lot more.

Any Questions?

As I said at the top of the post, feel free to ask anything. I'm not going to be nearly as ruthless in terms of deleting posts as I have been in my other threads.

Footnotes

*This is actually true
**This is exaggerated

T-Bolt · Mar 13, 2014

This means the cutoff won't always be 1760. Some months it might be lower, some higher. Nor will the cutoff be the same for all metagames (I suspect OU will have the highest cutoff by far). And the decision of where to put the cutoff won't be decided by me, but rather by the leaders or councils of each metagame. It'll work like this: each month, I'll provide the councils with a series of stats as well as metrics for "candle intensity," and they'll use those figures to decide where the cutoff will lie that month.

Will the number of players above different cutoffs be part of these stats? Just so that there is a sufficiently large sample space to base usage stats on (not sure if sample space is the correct term here)

Seiterman · Mar 13, 2014

I'm very happy to see this change. As much as I've enjoyed seeing new players get into competitive mons the past few months, the metagame is first and foremost supposed to be competitive. Rewarding competitive players with more influence (as well as possibly giving a new incentive for new players to become competitive and knowledgeable,) is wonderful in and of itself. Allowing Pokemon like Donphan and Starmie into UU while bringing back up threats like Keldeo and Latias should make for healthier metagames.

As a side note, I guess Gengar is now the only Pokemon who's stayed OU every generation.

Antar · Mar 13, 2014

T-Bolt, the cutoff's aren't hard cutoffs. The weighting system simply measures the probability that a player's rating is above X. Check out the sample calculations I posted in the OP the gist of how the system works, or read more in the Weighted Stats FAQ.

In any case, the calculations I performed above are all ones you can perform solely with data that are in the stat reports.

LinkRaider · Mar 13, 2014

This doesn't means that the ONLY players who have a ELO above 1760 will be taken in to account right?

Atm I have a ELO of 1758, a GXE of 70.9 and a Glicko-1 1668 ± 28, how will my performance affect the tiers? Sorry for my ignorance but I really want to understand the system.

BurningMan · Mar 13, 2014

LinkRaider said:
This doesn't means that the ONLY players who have a ELO above 1760 will be taken in to account right?

Atm I have a ELO of 1758, a GXE of 70.9 and a Glicko-1 1668 ± 28, how will my performance affect the tiers? Sorry for my ignorance but I really want to understand the system.

No, all players influence the stats but with a higher score you got bigger impact on them.

Antar · Mar 13, 2014

LinkRaider said:
Atm I have a ELO of 1758, a GXE of 70.9 and a Glicko-1 1668 ± 28, how will my performance affect the tiers? Sorry for my ignorance but I really want to understand the system.

You're actually an unfortunate fringe case. Your weighting for 1500 is 1.0 and 0.000509 for 1760, so your battles barely count at all towards 1760 tiering, even while they counted as much as the very best player's when the cutoff was 1500. I know that's a bit fucked up. I'm sending you a PM to follow up.

Eeyore · Mar 13, 2014

How will the "Power" idea come into play here, without banning too many pokemon?
My concern is the No-Man-Zone of BL, BL2 and such. From what I've seen, those pokemon were rarely used in B/W and B/W2 here, if pokemon are deemed too powerful for a tier such as UU, and yet are never seen in OU (My first thought is Donphan, closely followed by Zapdos: one of which fell, and may be deemed too strong for UU, and the other rose, and is outclassed in OU. )

Antar · Mar 13, 2014

Eeyore, we'll have to see what the UU suspect council decides to put in its official initial UU banlist, but I can tell you now that upping the cutoff from 1500 to 1760 moved a lot of Pokemon that were BL into OU (suggesting that 1760 stats are doing a better job of correlating usage with "power").

Da Pizza Man · Mar 13, 2014

Hopefully this means Donphan will be UU

Antar · Mar 13, 2014

The Pizza Man said:
Hopefully this means Donphan will be UU

kokoloko said:
Hello, Donphan / Klefki / Starmie / Smeargle / Tentacruel / Sableye / Cloyster / Forretress / Galvantula / Salamence / Trevenant

Toljik · Mar 13, 2014

Right now there are no analysis pages for the Gen VI metagame on the main site. Could this be a reason why people aren't using optimal sets?

Antar · Mar 13, 2014

Toljik said:
Right now there are no analysis pages for the Gen VI metagame on the main site. Could this be a reason why people aren't using optimal sets?

For sure. But it's not the suboptimal sets we're so much concerned with--it's the stuff that no competitive player would *ever* think was good (like Leftovers in Little Cup).

magibas · Mar 13, 2014

Ahh, I see. that seems to make sense. I do like the idea of an unrated ladder though. You guys should implement that so that people who are just playing for fun, trying new things, or don't really know what they're doing can go there and not throw a potential wrench in your usage stats. Or what if you made a separate "premier ladder" where you had to reach at least a certain ranking to even get in? Anyways, the 1760 cutoff seems like a good way to collect more accurate data and make better tiers :)

Boltaway · Mar 13, 2014

So what happens if 1760 stats somehow aren't enough?

Also, what will happen with RU/NU? I don't feel as though 87,606 and 167,646 battles* compared to OU's 2 million plus. I know the plan will probably involve lowering the rating req, but doesn't that really fail to fix the problem? (Although, arguably, the RU/NU playerbases are much more competitive than OU). And if I'm not mistaken, RU/NU 1760 players have a lot more weight on the statistics than OU players, so is a ratings cutoff to determine tiers necessary in that case?

* I just used September 2013 stats since I assumed they'd be more active than now, which is true.

Antar · Mar 13, 2014

Boltaway, we will be evaluating the cutoff--balancing filtering noncompetitive players vs. preserving a large sample set--every month, for every metagame involved in tiering (OU, UU, RU, possibly Doubles, likely not NU), and the cutofs will be adjusted--up or down--based on what we see. I will always push for making the cutoff as low as possible, and I think that will be appropriate for most of the less active tiers (I think 1500 is a fine cutoff for LC, for example, but LC's not involved in tiering, so...).

Or we'll have those unrated ladders and the problem might go away.

We'll see.

The other point is, we made do with tiers based on *much* smaller sample sets back in the PO days (try searching for some of the 2011 stats threads), and anyway, I fully expect there to be more Gen VI RU and NU players than there were ever Gen V.

NnoitraGilga · Mar 13, 2014

This is just a question but what about an individual with multiple laddering accounts say a rank 1640 and then a 1708 does that mean said individual affects the percentage more than others?

Zebstrika · Mar 13, 2014

I'm pretty sure the usage stats are counted per battle, rather than per account or user.

migetno1 · Mar 14, 2014

With the move to use 1760 stats, would be possible for you to release 1760 weighted json data (in the chaos folder)?

HikozaruYes · Mar 14, 2014

Boltaway, I definitely think that a rating cutoff is necessary in RU because back in Gen V I remember when someone spammed the RU ladder and made Metang RU. I think that using a rating cutoff to decide tiers is used to promote a competitive metagame on the smogon battle servers without alienating the casual players.

Arcticblast · Mar 14, 2014

HikozaruYes said:
Boltaway, I definitely think that a rating cutoff is necessary in RU because back in Gen V I remember when someone spammed the RU ladder and made Metang RU. I think that using a rating cutoff to decide tiers is used to promote a competitive metagame on the smogon battle servers without alienating the casual players.

Except last generation Molk did it while actually doing well in battle (and playing a fuck ton of games). Weighted stats were originally used after stats were manipulated, however, with players using three low RU Pokemon and three NU Pokemon and purposefully losing to inflate their RU usage and move them into or prevent them from moving out of the tier.

Antar · Mar 14, 2014

migetno1 said:
With the move to use 1760 stats, would be possible for you to release 1760 weighted json data (in the chaos folder)?

Anything that gets moveset stats will get that json file.

HikozaruYes said:
Boltaway, I definitely think that a rating cutoff is necessary in RU because back in Gen V I remember when someone spammed the RU ladder and made Metang RU. I think that using a rating cutoff to decide tiers is used to promote a competitive metagame on the smogon battle servers without alienating the casual players.

You're conflating two events (Molk got Metang into RU, and he wasn't a bad player--the spamming of shit Pokemon in RU was a separate event that got us to switch to rated stats in the first place (ninja'd).

HikozaruYes · Mar 14, 2014

I didn't mean to sound like I was bashing Molk, I have the utmost respect for him as a player. I was just using that as an example of how one person can play a ton of battles to influence the usage stats and tiering of a pokemon.

Antar · Mar 14, 2014

HikozaruYes, it's a valid point, and I don't think anyone was suggesting you were bashing Molk. The point is just that no weighting system can prevent a top-tier player from boosting a particular Pokemon... if they can find a way to make it viable or can succeed even with having only five viable Pokemon on a team. See:

Antar said:
Will this stop Molk? For those with short memories, a few tier updates ago, Molk managed to get Metang into RU (up from PU) by making a decent team built around Metang. I understand the team did decently well. This was back in the PO days, so we don't know what his ranking would've been, but it seems likely it would've been a bit above 1500. Thus, this rating system would not have prevented Molk from getting Metang into RU. The difference here between what Molk was doing and what our recent "tier troll" did is what Molk's team was actually viable. So if you were playing RU at that time, and you happened to have no way of dealing with Metang, you would've legitimately been in trouble.

RicepigeonKKM · Mar 14, 2014

Sorry if this sounds silly, but now that the cutoff for tiering is 1760, will this only apply to Gen 6 OU or will it be retroactive to the tiering lists of previous gens as well? Looking at the 1760 stats for February would indicate that Chansey, Slowbro, Stoutland, Weavile, and Zapdos will rise to Gen 5 OU while Cloyster, Conkeldurr, Dugtrio, Gastrodon, Hydreigon, Infernape, Jolteon, Lucario, Metagross, and Vaporeon will fall down to Gen 5 UU... Or will the previous gens be unaffected by this?

The decision to base UU off of 1760 stats

Antar

T-Bolt

Seiterman

Antar

LinkRaider

BurningMan

fueled by beer

Antar

Eeyore

Antar

Da Pizza Man

Pizza Time

Antar

Toljik

Antar

magibas

Boltaway

Antar

NnoitraGilga

Zebstrika

migetno1

bRMT Developer

HikozaruYes

Arcticblast

Trans rights are human rights

Antar

HikozaruYes

Antar

RicepigeonKKM

Wait, who?

Users Who Are Viewing This Thread (Users: 1, Guests: 1)