The goal of this week was to improve upon our Ontology through familiarizing ourselves with the data we will be receiving from our data sources. We had met with Dr. Bansal and I was able to obtain feedback regarding the questions that arose when building the ontology. She was able to clarify how we will utilize the sameAs property to link our instances to sources like schema.org or dbpedia. She was also able to explain that what we decide to name our classes does not have to align with prior vocabularies.
Since we decided to revisit building the ontology once we finalize our data set, we went back to locating our data. We were able to obtain API keys to our various sources, and had thought that would be sufficient. The issue we now face our the limitations of the API's. For example the Great Schools API has a 3,000 calls per day limit. This seems very restricted, but since we are just looking to obtain a pool of data, this can still work. The reason this is sufficient is because of the methods they provide. There is a call that searches for nearby schools. We simply need to provide an address and a given radius to obtain a list of schools. If we enter a large enough radius, we should expect to receive a list of all schools in the US, which is exactly what we need. From there, we can realize all of the properties that are included for each school and begin to organize them within our created schema. Not all of the API's we obtained access to provided the same level of data access. I looked into zillow's API in order to obtain lists of properties that are either for sale or rent. Once obtaining the API key, I looked into what types of methods they provided. Zillow API usage was very limited. We are restricted to viewing one property per call. With the number of properties for sale, or for rent, we will quickly meet our call limitation if we were to use this in a live application. There were also similar issues with the API's my group members were researching, so we knew another approach to obtain the data will need to be taken.
Dr. Bansal had provided us a research paper that she was involved with that covered the building of a linked data application. In the paper, they had resorted to screen scraping websites to obtain their desired data. I am familiar with the process, and understand the basic concepts of how it will work, but have never had any hands on experience. There are many techniques that can be used to screen scrape data, and may be done using a number of different programming languages. Instead of diving into screen scraping without any guidance or best practices, I will ask for suggestions and resources fro Dr. Bansal. I hope to be able to familiarize myself with screen scraping and be comfortable writing scripts by the end of this year.