Dr. Bansal provided us with a research article done by her and Sebastian Kagemann titled MOOCLink: Building and Utilizing Linked Data from Massive Open Courses. This article was regarding a research project they did to utilize the data from multiple MOOC websites such as Coursera, Udacity, and edX. The document began with providing a background highlighting the guidelines and resources used to build their project.
The section on web-crawling section explained how Coursera's data was gathered through their course catalog API. I have heard the term API getting used a lot both at school and in the web industry, but I had never understood exactly what it was or how it was used. I did a google search for Coursera's Catalog API and found the base URL's. Coursera uses JSON for their data exchange format and provides links to what looks to be a file containing many JSON objects that contain information regarding the courses on their site. This data is made public, so anyone can query their data.
I was also not familiar with web crawlers and have never heard the term screen scrapers, and I began researching on these topics. The amount of information on web-crawlers on the web is a bit overwhelming. Web crawlers can be written in many languages, and there are also numerous web scraping applications; this site contains a good list of apps with a description of it's use.
This paper gave me a better understanding of how we will approach this research project. I was interested in their use of schema.org to find the CreativeWork vocabulary and wanted to learn how to use schema.org's vocabularies, so I visited their site. There, I found a really good Getting Started tutorial that explained how to use their vocabularies. The tutorial covered how to implicitly markup a webpage using html5 tags referred to as MicroData. Along with going through the tutorial on schema.org's website, I also found a Google project on their codelabs site that involves an android application and how to use linked data to integrate with the voice commands of an Android device. A video series was released on youtube that explains the concept of linked data, and how to apply it to the project. The video series also covers an open source graph data base called Cayley that was inspired by Freebase and Google's Knowledge Graph. I learned that Google's Knowledge Graph is a technology that powers the boxes that appear in Google search results that provide direct answers to your search instead of a just a list of links. Cayley provides an interface that displays a graph representation of data that has been structured as triples. This collective information of different methods of using linked data made me aware of the fact that this technology applies to large systems and has virtually infinite uses.
I plan to write some test scripts and use the data I was able to gather and integrate it with the Cayley tool to create a basic knowledge graph.