Web Scraping Project

Code to scrape all divisions of NCAA football box scores from www.ncaa.com, beginning in the 2013-2014 season.

To Do

scrape box score data on new website for current season: https://www.ncaa.com/scoreboard/football/d3
1. select desired customizability for scoreboard scraping for future runs... make running code very easily readable
clean data to ROUTE SQL standards
1. this includes ID checks for all csv files
2. error links check
3. other data cleaning?
4. potential compilation files?

Functionality

main

uses dates in the constants file to loop over all applicable dates and divisions.

menuSelect

Implements the traversal of the ncaa homepage, found here, for all dates and divisions.

scrapeBox

Scrapes all tables in the box score, getting date, team, and stats information. Must click on team name tame to switch from away team to home team data. Uses webDriverWait to check for missing data (error 404 or broken page).

data

Naming convention is date-away team-home team. errorLinks currently stores all links where data is missing. Have yet to follow up on whether missing data is consistently due to broken pages or broken code.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
src		src
.gitignore		.gitignore
D3_boxscore_scraper.py		D3_boxscore_scraper.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Project

To Do

Functionality

main

menuSelect

scrapeBox

data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Project

To Do

Functionality

main

menuSelect

scrapeBox

data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages