We have the hardest remaining schedule in CFB

I have a feeling UGA definitely will. Their offense is a dumpster fire.

South Carolina is a dumpster fire that just lost to Kentucky, but it would not surprise me one bit if they come out play good defense and beat the dwags this weekend.

Spurrier > Richt
 
I use a webscraper to obtain data from ESPN's game pages, the NCAA's game pages, and Yahoo's recruiting site.

I figured, but one can hope. I have to scrape data myself every year for my home-built fantasy draft tool, because my leagues have atypical scoring and default player rankings don't reflect our rules. Mining projected scores for QB,RB,WR,TE,DEF,K is a drop in the bucket compared to mining full team stats for 100+ teams, which is why I've never implemented my idea for a CFB team ranking and predictor tool.
 
I figured, but one can hope. I have to scrape data myself every year for my home-built fantasy draft tool, because my leagues have atypical scoring and default player rankings don't reflect our rules. Mining projected scores for QB,RB,WR,TE,DEF,K is a drop in the bucket compared to mining full team stats for 100+ teams, which is why I've never implemented my idea for a CFB team ranking and predictor tool.

Hit me up on PM, we'll talk nerd a bit and see if I can get you a scraper framework that will work for your project without spending too much time on it.
 
I use a webscraper to obtain data from ESPN's game pages, the NCAA's game pages, and Yahoo's recruiting site. I've been collecting and archiving data from various sources since 2011, and have managed to get a pretty comprehensive database going back to 2002 including an archive of all of the old cfbstats data.

It's free aside from the small amount of time I spend retooling the webscraper every time ESPN, NCAA, or Yahoo updates their site format. The data is not really in a good format for public consumption, however. I pull most of it down in JSON and store it that way. Since python can build native objects from JSON and serialize them back, I also work with it in that format, so it no longer enters a database or even a spreadsheet until I have a specific need to analyze it in a different format. That hasn't happened in a while.

If you're interested in free data, it means work. If you're interested in fast, timely, complete, well formatted data, you may end up having to pay for it. The last free reliable source I used was cfbstats, which has since become SportsSource analytics. They'll run you a couple hundred bucks per season for data. There was an open source python ESPN webscraper on github for a while that looked like it was working, but the last time I checked it was before ESPN updated their page format about a week before kickoff this year. No idea if it's still ticking. There are some football nerd types on some of the SBnation blogs that also appear to have some sort of advanced data source, but I have never bothered to research it. I'd be interested to know, though, if anybody else knows more about that.

We also have a couple of stats gurus on this site aside from myself who might know of something.

Writing a scraper in Python using BeautifulSoup is pretty straightforward. I use it to compile projections from multiple sources for fantasy football prep. And yes, it is fragile to page layout updates, but often times, if you rely on CSS classes to target HTML elements, those tend to not change, since that's the point of CSS (swap out CSS but keep the "structure" or "attributes" the same and get a totally different look and feel).
 
I figured, but one can hope. I have to scrape data myself every year for my home-built fantasy draft tool, because my leagues have atypical scoring and default player rankings don't reflect our rules. Mining projected scores for QB,RB,WR,TE,DEF,K is a drop in the bucket compared to mining full team stats for 100+ teams, which is why I've never implemented my idea for a CFB team ranking and predictor tool.


Jesus. What all stats do you need?

I used to roll my own scraper, but I've found a free source that's good enough. But it doesn't have play by play if that's what you're looking for.
 
Writing a scraper in Python using BeautifulSoup is pretty straightforward. I use it to compile projections from multiple sources for fantasy football prep. And yes, it is fragile to page layout updates, but often times, if you rely on CSS classes to target HTML elements, those tend to not change, since that's the point of CSS (swap out CSS but keep the "structure" or "attributes" the same and get a totally different look and feel).

Unfortunately, this was not the case with the ESPN redesign. They went to a restful implementation for the first time, so about 100% of the source changed. The only thing that stayed the same was the data source, which is obscured from me, for now.
 
I would love a chance to nerd up and make my own predictor, but I have a kid on the way and a nursery to build in my off hours. It might be an offseason endeavor.

<SIGH> Whatever happened to priorities?
 
Unfortunately, this was not the case with the ESPN redesign. They went to a restful implementation for the first time, so about 100% of the source changed. The only thing that stayed the same was the data source, which is obscured from me, for now.

I hate the redesign. Their pages don't even work right. Not AJC bad but really bad. It was nice and simple and usable before.
 
I hate the redesign. Their pages don't even work right. Not AJC bad but really bad. It was nice and simple and usable before.

It's probably your anonymizer setup. Their new framework has all kinds of user analysis hooks in it that drive on-the-fly content customization. Oh and also track your every subtle move, probably to be used for market research and/or sold to advertising firms.
 
I like ESPN's new mobile site a lot. One of the more responsive sites out there when other mobile sites shove way too much crap.

The big scandal is the NCAA getting rid of the text-based stats site. It wasn't pretty, but had everything you would ever want in a clear text form.
 
I like ESPN's new mobile site a lot. One of the more responsive sites out there when other mobile sites shove way too much crap.

The big scandal is the NCAA getting rid of the text-based stats site. It wasn't pretty, but had everything you would ever want in a clear text form.

I actually love the espn redesign for desktop but hate it for mobile. It delivers less information than it did before and it takes more taps to get to it.

Sent from my SAMSUNG-SM-G850A using Tapatalk
 
Jesus. What all stats do you need?

I used to roll my own scraper, but I've found a free source that's good enough. But it doesn't have play by play if that's what you're looking for.

I have limited coding experience and build my stuff using Access queries or VBA. The math is no problem to me. I just don't have the web savvy to mine the data efficiently.

1. For the ranking: just records of who beats who and by how many points, per week. So I'd want a record of GT over Tulane by 55, and another for Tulane losing to GT by 55. I actually built this long ago but lost it, and I no longer have a database with which to work. It needed 3-4 weeks of data to spin up, so it would be a mess at ranking teams this week. I found it was very good at predicting late season games, however.

2. For the predictor: New idea I've never tried. I'd need two records per game, each noting an offensive team, the opposing defense, the number of possessions, the offensive points scored, and the offensive yards gained. Don't need to know who won or lost.
 
I have limited coding experience and build my stuff using Access queries or VBA. The math is no problem to me. I just don't have the web savvy to mine the data efficiently.

1. For the ranking: just records of who beats who and by how many points, per week. So I'd want a record of GT over Tulane by 55, and another for Tulane losing to GT by 55. I actually built this long ago but lost it, and I no longer have a database with which to work. It needed 3-4 weeks of data to spin up, so it would be a mess at ranking teams this week. I found it was very good at predicting late season games, however.

2. For the predictor: New idea I've never tried. I'd need two records per game, each noting an offensive team, the opposing defense, the number of possessions, the offensive points scored, and the offensive yards gained. Don't need to know who won or lost.


Here's what I use.

http://www.drwagpicks.com/p/blog-page.html?m=1

Everything is csv. Offensive/defensive stats for every game.

Only downside is that he follow no discernible pattern to what he lists as a team name. Some will be "Michigan State" others "LSU" "G State". Follow a ööööing pattern dude.
 
I hate the redesign. Their pages don't even work right. Not AJC bad but really bad. It was nice and simple and usable before.

The new box score doesn't even bother to list the number of fumbles in a game.
 
Here's what I use.

http://www.drwagpicks.com/p/blog-page.html?m=1

Everything is csv. Offensive/defensive stats for every game.

Only downside is that he follow no discernible pattern to what he lists as a team name. Some will be "Michigan State" others "LSU" "G State". Follow a ööööing pattern dude.


Wow that's awesome, thanks for posting that stuff.

How did you handle sorting through the team names? Just make a lookup table for them or search and replace?
 
Wow that's awesome, thanks for posting that stuff.



How did you handle sorting through the team names? Just make a lookup table for them or search and replace?


Trial and error.

What really pissed me off was when he changed a team's name between season.

If I wasn't getting this öööö for free....
 
Back
Top