This week, we continued to work on scraping our selected data sources with the goal to scrap Spot Crime's data. The way Spot Crime's site was structured was different to Great Schools, so a different approach would need to be taken to get everything we need. What made this a bit more difficult was how they linked to the specific crime maps we needed. Writing the script to obtain all of the links to the state pages was simple, because the url structure for each state was consistent: spotcrime.com/<state abbreviation>. The difficulty lies on each of the state pages. The page consists of 3 tables of links:
After studying the structure of the site, we figured that our best approach would be to pull the first a tag after every td tag. We should be able to do this, because each state page has the same class name for the tables, so knowing that will allow us to grab all of the tables from each state page. Once we do that, we can use python in conjunction with beautifulsoup libraries to locate the links for each crime map.
Each county's crime map page contains a list of crimes and provides the type, date, and location. When you select a crime instance, you are brought to a detailed description of the crime that lists the time, case number (sometimes), and a summary of the crime. Saul wanted to take on the task of pulling the crime maps and data.
For next week, I plan to figure out a way to gather the lat/long of all of our address, or see if there is another approach that we can take with only addresses. I would also like to move on to scraping our next data source.