Code to scrape all divisions of NCAA football box scores from www.ncaa.com, beginning in the 2013-2014 season.
- scrape box score data on new website for current season: https://www.ncaa.com/scoreboard/football/d3
- select desired customizability for scoreboard scraping for future runs... make running code very easily readable
- clean data to ROUTE SQL standards
- this includes ID checks for all csv files
- error links check
- other data cleaning?
- potential compilation files?
uses dates in the constants file to loop over all applicable dates and divisions.
Implements the traversal of the ncaa homepage, found here, for all dates and divisions.
Scrapes all tables in the box score, getting date, team, and stats information. Must click on team name tame to switch from away team to home team data. Uses webDriverWait to check for missing data (error 404 or broken page).
Naming convention is date-away team-home team. errorLinks currently stores all links where data is missing. Have yet to follow up on whether missing data is consistently due to broken pages or broken code.