In search of a new report card
The following passage begins “Simulators,” the introduction to the 2012 Baseball Forecaster:
“As fantasy owners, we take pride in our ability to assess baseball players and to project their performance. Our success in piloting our teams to titles and money finishes demonstrates our prowess at these skills.
But it really doesn’t.
Player evaluation and forecasting are vital parts of building the foundation of a winning team. But we don’t give ourselves the opportunity to measure the results of that evaluation process. Yes, we get a “report card” every October, but it’s deceptive. Our final point totals are amassed not only by the players we draft—the results of our painstaking off-season research—but also by many other players who churn through our roster during the season.
We only truly exercise our player evaluation muscles during the off-season. The stats have stopped accumulating, allowing us to dig into them, evaluating each player’s individual performance in deeper context, and thus to project probable future performance. Then we draft players based on that comprehensive analysis, and our expectation of what they are going to do over six months.
But we typically base our in-season roster moves on far shorter periods, and far smaller sample sizes, making these decisions less accurate and the results more volatile. Every player we add to our roster during the season means we’re eliminating one player we likely rostered through more careful analysis. And given how much turnover a typical fantasy roster undergoes during the course of a year, our in-season roster management can negate much of our off-season player evaluation efforts.
As a result, we have only limited insight into how good anyone is at player evaluation. Even ourselves. We need a way to more accurately evaluate this core competency, and at the same time, hone our skills and gauge our efforts at assembling an optimal roster. We could just take all our pre-season projections and compare them to the end-of-season actuals. That’s easy. However, that wouldn’t assess our skill within the context of building a team.”
While I set out to solve this problem in the book – and you can “cheat” by going back to read it if you have a copy – I want to put it out to you here again. I think the accurate assessment of our abilities is still an open issue, particularly now.
BABS needs its own unique report card. In fact, I think BABS should be its own laboratory in which an assessment measure is just one element.
This is one of my thought projects for the off-season. It will also include the design of a 2018 fantasy game format that could serve as our laboratory’s testing ground.
I’m interested in your comments, ideas and random brain flakes.
This ongoing search: “I think the accurate assessment of our abilities is still an open issue” is why I follow you with increasing interest. Anyone who believes they have it figured out is a BSer while those who openly pursue progress are worth following.
The measure I’d suggest is an annual review of how BABS cluster rankings fared vs. ADP of same players. I feel like I’ve found qualitatively that BABS long view works vs. the “popularity” index of ADP but you’d gain by measuring then exposing that data in a macro.
Yes, that is a pretty straightforward approach. However, that only measures the “accuracy” of BABS. I’m trying to take it a step further – measuring BABS within the context of team construction. Assuming that BABS does a better job of capturing player performance (yes, still an assumption), my question is whether BABS helps us to construct better teams. Some type of multiple draft or league scenarios – reaching some critical mass, perhaps – are required here.
It seems as if BABS highlights groupings of players that are similar. Assuming a straight draft, and that not all those players will be available to you, and that you will have positional or category needs when it comes to you that excludes some groupings, it seems to me like you would have to run multiple simulations of these groupings randomizing players within the group, using some sort of hierarchy to prioritize by position (which will change year to year, meaning, strength of position in a given year), to generate thousands of possible teams. Then, when those teams, use their actual stats to generate standings. Perhaps that would do nothing more than identify areas for improvement, but it is a complex task you are setting on, involving lots of simulations.
A simpler method would be to piece it out. For example, I would simply like to know the correlation between players who have projected injury issues (or not); and whether they actually had injury (or not). So much in winning depends on 1)keeping your players in the lineup, especially the good ones and 2) finding the breakout player that has a good shot at giving unexpected value. If BABS does not reasonably give a leg up on those two areas, I’m not sure if its worth the effort to look at a more complex analysis, as these areas seem core to the utility of BABs.
I do remember one time when Ron talked about a very important part of a players value. Playing time. To accurately evaluate a player, or a whole league, is like trying to accurately forecast the weather. We keep searching for the best evaluation formula in order to draft the best team and to get an edge on those in our leagues. For me BABS is the best and if there’s a better way, my money is on Ron to figure that out. A 2018 fantasy game format as a testing ground is an excellent idea. I would hope that format would include AL/NL only and mixed to mirror the leagues we all participate in.
I think BABS does provide the best measure we fantasy leaguers have. I am a big fan. By grouping players with similar talent—as that talent has been displayed historically—we can work to amass as many players as we can in any given asset group, which in turn allows the best shot at a core team.
The problem, of course, is the sustained application of that talent—which is remarkably volatile. Just look at Miguel Cabrera this year, for whom at the beginning of the season Ron was tempted to create a special category of batting prowess. He has tanked this entire year, and for no obvious reason: no major injury has been identified, no source of his problem is clarified. He simply isn’t playing well. He’s batting about as well as the average catcher—which is to say, not as good as half of them.
The volatility of applied talent is the reason we are tempted to make changes in our rosters during the season. In effect, we fight fire with fire: to combat the unexpected and the unpredicted, we substitute a player whose current performance concretely suggests improvements to the accumulation of counting stats—for however long that particular performance lasts.
So it seems to me that we construct a core team, and then respond on the fly to the volatile outbreak of injury and failure.
Where I am going with this is to wish for a better definition/measurement of risk. Cespedes’ injury risk, for instance, is ridiculous, and of another order than JD Martinez, though both are listed as a similar risk according to BABS. Perhaps more importantly, we need a quantitative measure of relative risk. Is changing leagues more of a risk than merely changing teams, for instance? Are there variations in the risk of aging (Bautista versus Cruz, for instance)? Is the injury risk of a person who is on the DL because he crashed into a wall greater than, or less than that of a player who pulled a hamstring running to first base?
If we had a better handle on risks, I think we might have more insight into volatility—which, in turn, is the enemy of BABS.
Just today, Brad Ausmus admitted that Cabrera has been battling chronic back issues all year and might have to deal with them in the future.
I’m curious about the effects of daily roster management of my team. For instance, when I know a player isn’t in my lineup, I’ll sometimes replace him with a player on my bench. I know maximizing ABs is good, but at what point do the results turn negative? I also struggle with knowing when to replace players who are performing badly. Last year, Justin Upton, after a long, long time, eventually produced numbers close to what we expected. Miggy, this year, won’t. The revised DL has also impacted roster management. Is it better to replace a player who will be back soon with an inferior substitute, just to get those precious ABs and counting stats? The DL issue very definitely reared its ugly head for me this year — I made over 35 DL moves so far (some players more than once), with 15 of my original 25 drafted, active (i.e., not reserve) players going on the DL at some point (again, so far)).
The secondary report card in your 2nd paragraph is easy to do and I’ll probably run some reports this off-season. I think part of the answer from your 1st paragraph is to just get into a TON of leagues. But more on that later…
These days there are too many challenges to be able to accurately project playing time. I think BABS continues to handle it as best as can be expected. We’ll talk more about all this other stuff soon…
Risk has as wide an error bar as performance – perhaps even wider – so that will continue to be a challenge. And remember that no matter how well BABS does in helping us construct our rosters, she is NOT going to be perfect (don’t tell her I said that). There was no way that we would have been able to predict how bad a season Miggy is having. There will always be outliers. That said, he did come into the season with an “-inj” liability rating. On its face, that might seem insufficient to explain his performance, but odds are we will find out that the little nagging injuries he’s had this year were a lot more significant than reported.
I should have read ahead. There you go.
Assuming traditional league set-up… On the batting side, more ABs are always beneficial. There is only one ratio category to deal with – BA – and one player’s failings are typically not significant enough to drag down an entire team. Multiple bad BA players are another story. Pitchers are a bit more risky.
Odds are there will be a lot written about the 10-day DL this winter. We all faced enormous obstacles as a result of it. But players got legitimately hurt too. My SiriusXM experts league team currently has a DL list that looks like this:
Brandon Belt
Trevor Cahill
Willson Contreras
Nicky Delmonico
Jedd Gyorko
Bud Norris
Chris Owings
David Price
Drew Smyly
Yasmany Tomas
Jesse Winker
Alex Wood
Is there a way to combine and weight BABS in-season projections against an overall long/career view, with weights determined by variables such as length / volatility of career in relation to current in-season performance. You would use a set of in-season sample sizes (25 games | 50 games | 75 games | 100 games), with each sample size weighted accordingly. Then, you could possibly assess either as a single output or maybe a side-by-side comparison of sample performance vs. career indicators, taking into account other measurables that might provide qualitative evidence of changes in behavior (launch angle, zone contact, exit velocity)…it would be cool to look at career | sample | contributing factors to come up with a likelihood of potential sustainability of the sample size.
I remain reluctant to do anything with in-season ratings (and they are not projections, just ratings) beyond observing them from a safe distance. However, the idea of using full season ratings to provide more insight about longer-term career trajectory is intriguing. I am not ready to take that leap yet, but I can see it as a natural next step down the line. Cool thought.
Lots of good ideas here on how to assess the predictive value of BABS.
Here’s my core idea for some “leagues of our own,” followed by a few details:
We didn’t get enough takers at FPAZ to establish a BABS league. That’s an opportunity to create leagues among subscribers, possibly inviting FPAZ attendees if we still have openings ~ January 1. I think there could be enough interest to run a 12-team AL-only, a 12-team NL-only, and a 20-team Mixed.
Minimizing bench size places greater weight on prediction and less weight on in-season roster management while reducing the impact of speculative rostering based more on paths to PT than assets and liabilities. With an NFBC (or other cooperative site) slow draft—straight or snake—we could start soon after the holidays and finish in time for a spring supplemental draft of prospects and rookies added to the BABS pool then. Of course, it’s entirely up to you, Ron, but here are my suggested details:
1. Owners agree to rely solely on BABS groupings for player selection.
2. Total roster size is 27. Active roster is 20: C, 1B, 2B, 3B, SS, IF, 4 OF; 7 SP and 3 RP;
bench reserves are: C, CI, MI, OF, 2 SP, RP. Weekly transactions between them.
4. Slow draft 24-25 players for any required positions (+ no strict definition of SP/RP); supplemental slow draft for final 2-3 players to fill out required positions remaining.
5. FA drafts/bids ~June 1 and ~August 1 to replace up to a maximum of 2 non-MLB DL players and/or a maximum of 3 MLB DL players each time period; straight draft/bid in reverse order of standings (no FAAB $).
6. Hitting categories: TB (or HR, power skill), SB (speed skill), OBA (or BA, core batting skill + only ratio), PA (or AB, playing time).
7. Pitching categories: K (dominance), SV+HLD (relievers), WHIP (core pitching skill, not team dependent like ERA + only ratio), IP (or BF, playing time).
Initial rosters could be judged on assets and liabilities, and we could see a small sample of how they play out. Based on comparison of initial rosters and final standings, which assets seem more predictive of success and which liabilities more predictive of failure? With a limited bench and FAAB, the effect of intervening roster management variables would be reduced, but there would be enough flexibility for fun/to keep owners involved. That leaves a more direct connection between final standings and BAB’s predictive value.
Greg – Thanks for the thoughtful post. Some quick points:
– Definitely going to run some leagues this winter. Those FPAZ attendees who missed out will get first dibs on spots.
– I disagree about minimizing roster spots. All that does is either a) relinquish results to the impact of the DL, making it a last-man-standing league, or b) force in-season moves the be the determining factor. I still think that deep draft rosters and minimal (or none) in-season free agent access is the best method to measure prognosticating ability.
– Still playing around with the categories but I like a lot of what you are suggesting.
[…] summer, I began a search for a new report card for BABS, essentially to find a way to measure her true value. Since the goal of the entire system is to […]