Sunday, March 21, 2010

Fantasy Basketball Stats: Standard Deviation and You

Hello Basketbawful readers! Due to some circumstances, I will be revealing my end of the season fantasy basketball analysis a bit early. Although my writing style is typically humor and sarcasm with some stats thrown in (okay, more like sarcasm and pessimism), today I’d like to take a more refined look at the stats of fantasy basketball.

The Intro:

So another year of fantasy basketball is winding to a close. Maybe your team got pounded by injuries; maybe your team had Dirk, Nash, and David Lee and cruised to victory (like mine). There are many different methods out there to look at and evaluate player performance, and there are lots of ranking systems. Sure LeBron was obviously #1, but what about down the list? Do you really trust those pre-rankings? Today I’m going to talk about a method of evaluating the numbers, so hopefully during next year’s draft you can use your 90 seconds scrambling for injury and team information while having some confidence in the numbers to expect.

The Method:

We are already quite comfortable with using averages in sports stats. LeBron scored 29.9 points per game. Dwight grabbed 13.6 rebounds per 36 minutes. So instead of jumping to PER or RAPM or some other complex analysis, why not go to just the next step with standard deviation? In fantasy, we have the entire population (all players that have logged minutes in an NBA game), and all we really care about is choosing the guy that is better than what the other teams have. Standard Deviation could fit this need!

Well let’s not get ahead of ourselves. For example with Yahoo!, you get all the raw number totals and averages, and even their special “O-Rank” and “Rank”. Why expand beyond that? Well the problem is, when you sort by FG%, or TOV, things start looking strange. Is Marc Gasol’s 58.3% going to help your team more than David Lee’s 55.2%? Just how bad is Dwight Howard’s 60.3% FT shooting going to kill that category? Kinda hard to tell by eyeballing it. Even with the raw numbers: just how much will having Steve Nash on my team dominate the assists category?

The Good News:

Enter: Standardize. If you really don't want to do math, then I’ve still got good news for you: this is all done by ESPN’s Fantasy Basketball Player Rater. In fact, if you are quite satisfied using just the ESPN Player Rater, you probably can stop reading the article now. Here you can see all the Standard Scores in each category, and they are added up to make the final column as a composite score (yes, this makes more sense than adding percentages together randomly *coughHOLLINGERcough*).

For example: LeBron has scored 2033 points as of this post. The league average is 472.5 and the standard deviation is 410.25 pts. So (2033 – 472.5) / 410.25 = 3.8. Meaning LeBron is 3.8 standard deviations above the league average. For those not familiar with standard deviations, a score of 1 puts you above ~84.1% of the population, 2 puts you ~97.7% above, and 3 puts you ~99.9% above, and 4+ is outstanding. Isn’t this what you really want to know on draft day? You can find overall contributors with a glance, and see what needs you are lacking and pickup specialists without having to guesstimate the raw numbers.

Another benefit of Standardizing is the use of negative standard deviations, so you can see when a player is really hurting your team!

The Workarounds:

Okay so the bad news here is ESPN only shows the 8 categories. If you’re playing with TOVs, how does that fit in? Also, how do I calculate the FG% and FT% numbers since they aren’t raw numbers?

Well here’s where we start doing things for ourselves. Pickup your favorite script of choice, or start copying and pasting CSV text from basketball-reference/your favorite website. Now then, turnovers are easier: since it works as a negative statistic, I simply found all the Standard Scores then changed the signs.

For FG% and FT%: I personally believe ESPN doesn’t give enough weight to the amount shot. Shouldn’t LeBron shooting 50.0% at 20.2 FGA/g have more impact than Varejao shooting 57.1%, but only 6.4 FGA/g? Well I think so, which is why I normalized first, then weighted by shots taken before dividing by standard deviation. My FGscore is defined as:

FGScore

And I standardize the FGscore (average is already zero, so really I’m just dividing by standard deviation). So LeBron ends up with 2.37, and Varejao with 2.07, not that anyone would think of drafting the latter over the former. But in any case, now we can properly rank players by their FG%, so all the lacktators with 1.000% FG% filter to the bottom.

Same thing with FT%: Is Nash's 94.1% (2.7 FTA/g) or Carmelo's 83.1% (9.3 FTA/g) helping you win the category more? ESPN puts Nash over ‘Melo, but using my FTscore, Carmelo scores a 2.67 while Nash scores a 2.36. Of course, Durant and Dirk still dominate the category.

The Advanced Bad News:

Okay, I’ve been far too positive towards ESPN. This sounds almost too good to be true. What are the limitations of this method? Like I said, Z-Score happens to work well since we have the entire population of data. However, a simple glance at the data will show you that we are NOT working with normalized data, one of the assumptions in Standard Scores! Going one step further, I looked at the skew and kurtosis of each category, and they are off the charts, with the worst skew on blocks at 2.2 and kurtosis on FT% at 16.81.

In simpler terms, this means some standard deviations at the far ends may be inflated more than they should be. For example, Dwight gets a near 6 score in blocks, which statistically should not happen in only ~450 people. It’s like one in a million. So as with all advanced statistics, use them carefully!

In addition, I did a Principal Component Analysis (PCA) on the 9 factors. Turns out there’s such a strong negative correlation between Points and Turnovers, and modestly strong correlations between Points and other categories, it’s not even worth bothering looking at the TOV category! Stupid Yahoo!!

The Advanced Customization:

So maybe you hate Turnovers. Maybe you hate my FGscore and FTscore. I think it’s also perfectly valid to try and dominate the 6 raw stat categories! It’s very intuitive, and any wins in FG%, FT%, or TOV is just gravy. In fact, I’ve done just this...

Putting it all together: We can analyze total season numbers, per game numbers, or per (36) minute numbers. I’ve proposed looking at standard scores of the 9 categories like Yahoo!, the 8 categories like ESPN, or the 6 raw stat categories. Well since we’re already working with Standard Scores... why not just add composite scores together? I did this with the completed 2008-09 data. So total season numbers help show how much a player contributed, per game numbers account for some injuries and such, and per minute numbers account for varying playing time. Since they’re all standardized now, I just add them together to get a super-composite score, for a really quick look at who did the best (i.e. which players I should really be comparing during my 90 seconds to draft)!

After doing all this, and comparing it to 9, 8, and 6 category, it turns out there’s lots of correlation among them, but the analysis that made overall sense was... 6 category! Are you serious?! After all that work I did messing with TOV and FG% and FT%, you could essentially ignore them?

Well sorta. Mostly, it’s Steven Hill’s fault. Because he played in only one game with 2 minutes played, his fantasy impact scales to the absurd (remember all that stuff about skew and kurtosis). But looking at only the 6 raw stats, even he can’t escape a more proper ranking!

Another way to avoid this, and possibly help further normalize the data: just take the top 200 player, or top 100, or whatever, and treat that as your total population, because lets be honest: no one’s putting Mario West or JamesOn Curry on their fantasy teams. Hell, to simplify things, take the top 4 players and do a pretend 2 person draft. From there, you can see what categories you’re taking, which you’re giving away, and which analysis to use.

The End:

I hope this gave you some insight into stats and fantasy basketball. Of course, when it comes to injuries and rookies etc., you’re still on your own. This method I presented is highly useful to roto or h2h, and can be expanded or contracted at your liking, despite its limitations. Want to look at the past 3 years combined? The past month only? Go for it. Don’t just trust those pre-built rankings anymore, grab your favorite programming language/spreadsheet/abacus and find those undervalued and steal picks!

Other random notes: Yes, I graphed a ton of stuff while doing this. Tips I picked up:
  • With the top picks, don’t over-value 3pt shooting. It is easy to pick that up later in the draft, or with waivers.

  • As I implied before, FG% and FT% is pretty even down the board. Use Standard Scores to slightly suggest one guy over the other. e.g. If you pick up Dwight Howard, concentrate on FG% guys cause there’s probably no combination of players you can pick up to make up the FT% column.

  • Blocks are sparse, but spread out down the board.

  • Of the remaining stats, Steals and Points have the strongest correlation, and Steals is usually a close category (so every little standard score counts!). This is probably why drafting bots tend to eat up all the point guards early.

And finally, in good Basketbawful fashion, how does my ranking compare to ESPN’s for the worst fantasy player of the season so far?

-6.05 Primoz Brezec
-6.05 Jarron Collins
-6.01 Kwame Brown
-5.98 Eddy Curry
-5.97 Lindsey Hunter
For reference, Mario West has a -5.54.

No comments:

Post a Comment