Data Viability Ceiling: A Measure of How Far a Pokemon Can Take You

A few hours ago I updated July's moveset statistics to include a new piece of data, a measure I call a Pokemon's "Viability Ceiling."

Simply put, a Pokemon's Viability Ceiling corresponds to the highest GXE of any player using that Pokemon.

For the uninitiated, GXE is probably our best and most accurate measure for determining player skill and ranking top players. A player's GXE corresponds to the percentage of matches a player would expect to win on a ladder with no matchmaking. The PS ladder is sorted by Elo mainly because it's the more familiar rating system. Due to the fact that Elo sucks fundamental differences in how the rating systems are computed, there is no way to directly convert from Elo rating to GXE.

So what this measure tells you is just how far it's possible to get with a team that has that Pokemon on it. If an RU Pokemon has a Viability Celing of 85, that means *someone* made it into the top 40 on the RU ladder using that Pokemon (Yes, I can tell you who. No, I won't tell you who).

Just *how* many top players are secretly using "shit mons" on their teams? If you're interested in finding out and you have the programming acumen, the json files in the "chaos" folder give a little more detail. "Viability Ceiling" is in the json files as an array of four numbers: number of players using the mon (a useful number in its own right!), top GXE, 99th percentile GXE and 95th percentile GXE. So whereas it's one thing to say that a top-10 player on the Doubles OU ladder was using Dragonite, it's another to say that 1% of 2744 Dragonite users are among the top 4.6% (611) DOU players.

My ambition with presenting this analysis is for people to stop blaming failures of certain mons to drop on "garbage players using garbage mons" and instead acknowledge that at least one of two things are happening:
  1. Either some of these mons are a bit more competitive than the viability rankings would have one believe, or,
  2. We have some very talented trolls at work on our ladders.
 
I've gone ahead and published a spreadsheet of viability ceilings for RU.

It might also be useful to have some concept of how GXEs translate in terms of ranking and percentile for the RU ladder, so here's a quick breakdown. Keep in mind, these numbers do NOT generalize to different ladders (they'll look completely different for, say, OU)
  • The top RU player has a GXE of 90.
  • 16 players (99.8th percentile) have a GXE of 87 or above
  • A GXE of 85 corresponds to the 37th top player (99.7th percentile)
  • The person ranked 187 (98th percentile) has a GXE of 80
  • 544 players (95th percentile) have a GXE above 75
  • A GXE of 70 roughly correlates to the 89th percentile (top 1320)
With this context, here are a few highlights from that spreadsheet:
  • The most viable (least unviable?) Pokemon are Accelgor (B+), Alomomola (A+), Aromatisse (A-), Cobalion (UU), Drapion (B+), Emboar (A), Flygon (A), Garbodor (C), Gurdurr (B+), Hitmonlee (A-), Medicham (B+), Meloetta (A+), Rhyperior (A), Sneasel (A+), Spiritomb (B+), Steelix (A+), Virizion (A+) with Viability Ceilings of 90. S-ranked Scrafty and Tyrantrum both have Viability Ceilings of 89, which is basically the same.
  • The Viability Ceiling of the other S-ranked Pokemon, Glalie, is significantly lower, at 87. Is this is because none of the top players want to use Glalie or because no one can figure out how to use Glalie successfully? Well, that's the question, isn't it?
  • Typhlosion's Viability Ceiling is a perfectly respectable 85, placing it among Mespirit (B+), Omastar (B-), Gourgeist-Super (B-) and Malamar (B-) and above B- Granbull, Audino and Jynx.
  • Ambipom's Viability Ceiling is 83 (as is Cinccino's). This Viability Ceiling is shared by B-ranked Piloswine and Sawk.
  • Torterra, despite being ranked B-, has a relatively atrocious viability ceiling of 77.
  • So far, the lowest Viability Ceiling I've found for a viability-ranked Pokemon is D-ranked Linoone's Ceiling of 68.
 
Last edited:
Clearly these stats are getting at something very different than the traditional viability rankings. And that makes sense: what a highly skilled player can "get away with" using is going to be very different from what a less experienced trainer should be using to maximize success. I really don't mean this thread as an attack on the fine work done by the folks who put together viability rankings. I mainly just thought it was a cool metric.
 
Also, some python code for generating spreadsheets like the one in the second post, using the unweighted OU file as an example.

Code:
import json

chaos=json.load(open('ou-0.json'))
for poke in chaos['data']:
	vc = chaos['data'][poke]['Viability Ceiling']
	print poke+','+str(vc[0])+','+str(vc[1])+','+str(vc[2])+','+str(vc[3])
 

Imanalt

I'm the coolest girl you'll ever meet
is a Tiering Contributor Alumnusis a Battle Simulator Moderator Alumnus
My ambition with presenting this analysis is for people to stop blaming failures of certain mons to drop on "garbage players using garbage mons" and instead acknowledge that at least one of two things are happening:
  1. Either some of these mons are a bit more competitive than the viability rankings would have one believe, or,
  2. We have some very talented trolls at work on our ladders.
The other explanation for this is that even the relative top of the ladder is still much worse than the actual top players in each tier, who tend to rarely ladder because of the low quality of the ladder.

And that is one of the two core problems with this metric. Almost any good player will tell you that on most, if not all, ladders, gxe means very little, because the quality of opposition is so poor that almost all players above a threshold that most people would consider barely adequate should win almost every game, and which strategies are good for maximizing your wins warp when your expected winrate is well above 50%, as consistency is favored to an extreme degree.

The second major problem is that one person using a mon once does not necessarily mean much. A much better method would be based off of what rank X number of the mon are being used, but this has a lot of issues with the fact that not all roles for mons necessarily result in equal usage, even if the mons are equally "good." Alternatively something like the GXE record that the mon would have based on all games played with that mon would potentially tell us something, although if it is very popular with poor players that could bias this number. Realistically we do not have the data to come up with one number that even close to approximates a pokemon's viability ceiling, or viability rank. What we do have is the ability to weigh several metrics to get a more objective look at what types of players tend to use a mon, and how successful they are with it, and that is what could actually be valuable to pursue.

sorry if im not clear with what im saying with some of this, im tired as hell and its 1 am
 

marilli

With you
is a Forum Moderator Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Past SPL Championis a Former Other Tournament Circuit Champion
I would wager on your second option, except with a small catch: 2. we have trolls who are 'talented' by ladder standards. There is a slight difference, but it makes the whole difference. Most of the players I would consider decent have the ability to peak on the ladder if they had the time and will to do so. Antar, you are the master data miner here and you know full well that most PS players are people spamming Ash teams or in-game teams. In that light, some of the people using Ambipom are real talented.

And I love how you don't mention that if you look at where the 95th percentile of Ambipom users have 66 GXE, a far cry fry from average 95th percentile GXE of 75. Just by simple glance without even needing to do real work, 66GXE on the 95th percentile look just about lower than any other Pokemon that's RU by usage. Exact same can be said of Cinccino. Typh looks a bit better than it, but that's besides the point.

The highest GXE of a player relies so much more on the user than the Pokemon that trying to do any data analysis with it sounds like it will get drowned out very easily. But the numbers are there, and they seem to confirm what people believes. Remember, the complaint isn't about how Ambipom is unusable and will make your team lose. Because at the very least it can fake out and it makes a great death fodder if worst comes to worst. The complaint was that bulk of Ambipom / Cinccino usage is made up of players who have no grasp of competitive Pokemon (well, they might know what competitive Pokemon is, but they're just clicking buttons a la "It's super effective! I'll go for that super effective move!") The numbers are there if you're willing to look at it that way.

Of course a valid argument against it is that humans are imaginative creatures that can conjure up just about any excuse and make it into an argument to push their belief, but the same can be said with you, who openly stated your ambition is to prove that these Pokemon are more viable than people believe. Honestly, as an objective third party that doesn't really play RU anymore, I think the Ambipom / Cinccino stats are really just confirming what people believed.

edit: love stats tho so if i can figure out how json files work i'd love to take a closer look at it when i have more time!
 
Last edited:
The complaint was that bulk of Ambipom / Cinccino usage is made up of players who have no grasp of competitive Pokemon
Keep in mind--our weighted stats system take care of the vast, vast majority of such players. And yes, Ambipom's 5% GXE is extremely low. But more users (5458) use Ambipom than use any other Pokemon save Jolteon. If you do it not by percentile but instead by sheer number, Ambipom lines up pretty close to C+ rated Shiftry (possibly other more viable mons--I'll have to dig deeper into the data).
 
One thing these numbers aren't really taking into account which is a big part of what "viability" means is outclassedness. Often, a Pokemon has low viability not so much because it is bad and using it will make it a lot harder to win, but because there is just a better option available which will make it slightly easier to win and has essentially no downsides in comparison.
 

marilli

With you
is a Forum Moderator Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnusis a Top Contributor Alumnusis a Past SPL Championis a Former Other Tournament Circuit Champion
Keep in mind--our weighted stats system take care of the vast, vast majority of such players. And yes, Ambipom's 5% GXE is extremely low. But more users (5458) use Ambipom than use any other Pokemon save Jolteon. If you do it not by percentile but instead by sheer number, Ambipom lines up pretty close to C+ rated Shiftry (possibly other more viable mons--I'll have to dig deeper into the data).
I'm aware of the weighted usage stats. Still it stands that just by being able to consistently beat ash teams you have tolerable weight. Complaint is that even with usage weighting, usage weight attributed to below average competitive players (bottom half users with ash teams subtracted) is much higher for Ambipom than any other RU, which still seems to be true.

You don't need to use an ashe team or use in game teams to have little grasp of competitive pokemon. People who are beginning out will have teams that are intended to be competitive, thus beating ashe teams, but isn't actually competitive. In fact, I know for a fact that you can get really high on the ladder with an "ash team" (or similarly anime or manga inspired teams) if you put your mind to it. Doesn't mean the team is good, lol.

Apologies for not being anal about usage weight vs usage, I thought it'd be obvious what I was talking about.

I do agree there's a fair bit of politicking going on with viability ranks and many people believe it's their duty to curb usage of Pokemon that get more usage than they deserve, which never works because of wrong target audience. I do disagree with this practice but the reality is mons below a certain threshold are kind of equally bad in that you'd rarely use it other than as a joke.

edit: It's not rocket science to realize that a 'stat' that can be skewed by a single person who uses it once on a high rank alt has any real meaning. I can go on right now on my 89 gxe alt and just use pikachu once, and Pikachu will have viability ceiling of 89 regardless of how good it actually is, if I understand this correctly.

The concern was never that "why would you ever use Ambipom it will instantly make you lose even vs. bad players." It was that Ambipom stays in RU despite not being RU material. If you realize, top 5% of the Ambipom players have 66 GXE and any Pokemon with lower GXE at the top 5% users are absolutely garbage in RU, such as Frogadier, Articuno, and Muk. Given this context, why do you think the larger raw usage of Ambipom magically excuses the fact that a large number of below average "competitive" players making up the weighted usage? Isn't it exactly the concern many players are having? Why would you ask what's the highest GXE a certain pokemon can be used in, instead of seeing the distribution of GXE for weighted usage of each Pokemon? (For example, making a graph showing Ambipom's percentage of weighted usage coming from players with GXE between 90 to 85, as opposed to players with GXE between 50 and 60, and other GXE regions?

I do understand you feel it's necessary to point out that you can get high on the ladder with below average Pokemon. This is true! But that's not what people were concerned with to begin with, and to claim that Ambipom is better than Sawk because of this data sounds... off, to say the least.
 
Last edited:
One thing these numbers aren't really taking into account which is a big part of what "viability" means is outclassedness.
See personally, I think those are two separate things. I don't think a Pokemon's viability should be affected by whether or not there are better options. Not saying we shouldn't be giving folks "Use X not Y" advice, just saying that's not really "viability," imo.

edit: It's not rocket science to realize that a 'stat' that can be skewed by a single person who uses it once on a high rank alt has any real meaning. I can go on right now on my 89 gxe alt and just use pikachu once, and Pikachu will have viability ceiling of 89 regardless of how good it actually is, if I understand this correctly.
This month, yes. Next time, no, probably not, as I plan on "troll-proofing" this analysis (still working out the implementation details).
 
This is really interesting. I'd love to see how this changes over time. In a few months I'll try to make some charts of the 95th percentile for the major metagames to see how things have shifted as things fall in and out of style as well as get better and worse. I assume there isn't a way to this information from before July is there?
 

Dread Arceus

total cockhead
Cool stuff, but what's a Celing? Also is there a way to make it like, top 5 players that use it as an average other than just one? Only saying this cuz people like WECAMEASROMANS are gonna do their best to break your stats with like Ninjask and Castform lol
 
Cool stuff, but what's a Celing?
How did NO ONE notice this until now, lol? It's fixed.
Also is there a way to make it like, top 5 players that use it as an average other than just one? Only saying this cuz people like WECAMEASROMANS are gonna do their best to break your stats with like Ninjask and Castform lol
Next time... I plan on "troll-proofing" this analysis (still working out the implementation details).
 

Arcticblast

Trans rights are human rights
is a Forum Moderatoris a Tiering Contributoris a Social Media Contributor Alumnusis a Senior Staff Member Alumnusis a Community Contributor Alumnusis a Battle Simulator Moderator Alumnusis a Past SPL Champion
How did NO ONE notice this until now, lol? It's fixed.
I actually didn't notice it in Dread Arceus's post until I re-read it, so I don't anyone's blaming you

I read this thread the day it was posted and I share some of marilli's concerns but I'll work out how to articulate those at a later date (or more likely never because lol me getting things done)
 

toshimelonhead

Honey Badger don't care.
is a Tiering Contributor
This type of data might be better suited to a histogram format with GXE as the X-axis and (relative) frequency used as the Y-axis. Solely having percentiles doesn't give people a good enough idea as to the viability of a pokemon at different levels on the ladder if that is the question you are trying to solve here.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top