College Football Regression Analysis

ramblinwise1

beware the zealot
Joined
Dec 17, 2001
Messages
18,351
I am updating an old multi-linear regression analysis that uses standard box score inputs to predict a teams score. I have been only using results from top 10 teams and I haven't got them all in yet but am pretty sure it will change little now. Based on 55 games so far this year, I have an R^2= .733 so the model explains about 70-73% of the variation in score. Based on the model coefficients here is the predicted score for each of our games this year and the actual:

Team, Actual, Predicted

Jack St, 37-17, 41-18
Clems, 30-27, 29-27
Miama, 17-33, 16-37
UNC, 24-7, 30-5
MSU, 42-31, 38-35
FSU, 49-44, 46-45
VT, 28-23, 25-24
UVA, 34-9, 33-12
Vandy, 56-31, 51-32

Pretty cool, huh.:D
 
So what's the Wake score going to be?

I assume that the regression works by knowing lots of historical factors like:

yards gained
Yards against
turnovers
time of possession
 
How in the name of Zeus' beard did the model predict that Vandy scores 32 and vpisu only scores 24 on us? If you actually have a model that can correctly pick the winner of a game 100% of the time, then what the hell are you doing on this message board?! You need to be in Vegas.
 
I am updating an old multi-linear regression analysis that uses standard box score inputs to predict a teams score. I have been only using results from top 10 teams and I haven't got them all in yet but am pretty sure it will change little now. Based on 55 games so far this year, I have an R^2= .733 so the model explains about 70-73% of the variation in score. Based on the model coefficients here is the predicted score for each of our games this year and the actual:

Team, Actual, Predicted

Jack St, 37-17, 41-18
Clems, 30-27, 29-27
Miama, 17-33, 16-37
UNC, 24-7, 30-5
MSU, 42-31, 38-35
FSU, 49-44, 46-45
VT, 28-23, 25-24
UVA, 34-9, 33-12
Vandy, 56-31, 51-32

Pretty cool, huh.:D

Did you hold out some games to see how the model predicts scores that weren't used to make the model?
 
that is some badass correlation, ramblin!

good job. now get rich!
 
Did you hold out some games to see how the model predicts scores that weren't used to make the model?
This.

You're using the same trick the Global Warming guys are - calibrating a model to historic data and then claiming the model can predict. :)

What's the Wake score going to be? More importantly, what's the Clemson score going to be?

Also:

good job. now get rich!
..or at least share the data so we can all get rich.
 
either you're full of ****, or you need to send in your calculations to ESPN and get them some redemption. This is insane...

put up the Wake score! we'll see how well it predicts, too :biggthumpup:
 
Does it only work with data after a game is over? In other words, do you need the boxscore data from a game to feed into the mode? Or do you use historical data from the season, i.e. averages?
 
Does it only work with data after a game is over? In other words, do you need the boxscore data from a game to feed into the mode? Or do you use historical data from the season, i.e. averages?

even if its only the former, then you could do "what if" scenarios to see what gameplan would lead to the best outcome; not as useful, but still interesting

eg, If GT rushes for 300+ yards, then would win the game in 80% of the cases unless Duke goes over 400 yards passing...
 
Not so fast guys....

All games that the top 10 teams have played are in there except for about 3 teams. I did leave out any games against FCS teams except GT vs Jack St.

Before you get too excited remember that this says I can predict the score IF I KNOW WHAT THE OUTCOME ON THE INPUT X VARIABLES IS GOING TO BE. Of course, I would have to accurately predict the rush yards, turnovers, pass yards, TOP, 3rd down conversion % to actually pick the score as much as looking backwards. (insert big letdown here). But its still pretty neat to think that if you know those parameters that you can predict the score.

I will take the averages for Wake and Tech on the inputs and tell you what the predicted score is.
 
I will take the averages for Wake and Tech on the inputs and tell you what the predicted score is.

Better idea: normalize them vs what the opponent generally allows.

Instead of giving Tech our rushing average in rushing yards, give Tech what we beat our opponent's average rushing defense by, above Wake's rushing D.

Make sense?
 
Better idea: normalize them vs what the opponent generally allows.

Instead of giving Tech our rushing average in rushing yards, give Tech what we beat our opponent's average rushing defense by, above Wake's rushing D.

Make sense?

Yes, average wakes average rushing yards with Techs rushing yards allowed to predict wakes rushing yards against Tech. Yes thats better.
 
I would suggest you input only ACC conference game stats to see what the projection is - rather than including OOC games as well.
 
Yes, average wakes average rushing yards with Techs rushing yards allowed to predict wakes rushing yards against Tech. Yes thats better.
Unless I'm reading this wrong, you two are not saying teh same thing.

Beej is saying that if we NORMALLY beat our opps allowed rushing yardage by 120%, then you should multiply Wake's average by 2.2
 
Not so fast guys....

All games that the top 10 teams have played are in there except for about 3 teams. I did leave out any games against FCS teams except GT vs Jack St.

Before you get too excited remember that this says I can predict the score IF I KNOW WHAT THE OUTCOME ON THE INPUT X VARIABLES IS GOING TO BE. Of course, I would have to accurately predict the rush yards, turnovers, pass yards, TOP, 3rd down conversion % to actually pick the score as much as looking backwards. (insert big letdown here). But its still pretty neat to think that if you know those parameters that you can predict the score.

I will take the averages for Wake and Tech on the inputs and tell you what the predicted score is.
well that is expected, that the score is very well correlated with game stats.

Now try predicting game stats from previous data and score from that, this would be a pretty good predictor actually.

Is sagarin completely score based btw? i.e. no stats?

This is a good idea for a research paper.
 
Yes, average wakes average rushing yards with Techs rushing yards allowed to predict wakes rushing yards against Tech. Yes thats better.

Better, but not the best.

The best thing to do, which might be too big of a pain in the ass, is look at how much Tech exceeds opponents averages, and then have Tech exceed Wake's average by that amount. Make sense?

If Tech rushes for 330 a game, but Wake allows 100 a game, then 215 isn't the best number to use. The best number to use would be to figure out how many rush yards our opponents give up on average, then figure out the delta we exceed that by. Then add that delta to what Wake gives up. Or, as paintballer said, do it by ratio.
 
I'm imagining a peregrinating UGA fan stumbling on this thread.

:eek5::laugher::laugher:
 
Is sagarin completely score based btw? i.e. no stats?

This is a good idea for a research paper.

Saragin uses one that is pure points an one that is just wins/losses. I don't think he uses any stats.
 
OK, I have used the averages here and I can't find quickly 3rd down conversion allowed or penalty yards by opponent so these are just purely team averages.

Projected score if you use just the averages for the team (not normalized)

Tech 34, Wake 29

Now if you average the stats that I could find, it does wishy wash it down some. For example we average 304 yard rushing and wake gives up 142 ypg (they have a good rush defense). So averaging these two projects Tech rushing for 223. If you do it this way for all stats except pen yds and 3rd down conversion then the project score is

Tech 31, Wake 28

I think the ratio method would do better but thats probably enough playing for today.

Either way the projected margin is much closer than the line which is a little scary. Skinners questionable status is a big unknown.
 
OK, I have used the averages here and I can't find quickly 3rd down conversion allowed or penalty yards by opponent so these are just purely team averages.

Projected score if you use just the averages for the team (not normalized)

Tech 34, Wake 29

Now if you average the stats that I could find, it does wishy wash it down some. For example we average 304 yard rushing and wake gives up 142 ypg (they have a good rush defense). So averaging these two projects Tech rushing for 223. If you do it this way for all stats except pen yds and 3rd down conversion then the project score is

Tech 31, Wake 28

I think the ratio method would do better but thats probably enough playing for today.

Either way the projected margin is much closer than the line which is a little scary. Skinners questionable status is a big unknown.

For a test case, you can apply whatever you have done for the case of vandy and vt games. (of course using predicted game stats instead of actual game stats.)

Btw, I'll do a bit research on what previous work has used in game prediction, maybe we can write up something on this!
 
Back
Top