There was a gap in my understanding of how publicly linked data is used and maintained in an application, especially if the data being used needs to be dynamic. I understand that there are public API's that can be utilized in order to perform such tasks, but did not have a full understanding as to how. Also, another thing that was unclear to me was how to keep the data linked that is obtained via API's by applying what we have learned thus far about linked data. Would this require more manual labor to tag the entire data set? I decided to try to looking for examples or tutorials that would help me better understand these concepts. Through my research I have learned that linked data has been underused in the past, because it relied on manual tagging of the data and the tagging of the data was not consistent. This made it time-consuming to use and required screen scraping of individual sites to obtain or manually link the data. There are now services that provide found linked data that act as a repository for data sets. That also brings up more questions, why do we not have one source that all data is being linked to? Does this duplication of data cause gaps between the connections between linked data? These are questions I intend to ask Dr. Bansal for clarification on.
JSON LD is a method to serialize and transfer linked data. This talk really peaked my interest on why someone would use JSON LD vs other methods of transferring data. I found a site dedicated to explaining what JSON LD is and how to use it. The site contained many useful tutorials explaining everything from linked data basics, the issues we face with linked data, and how JSON-LD aims to resolve some of those issues. JSON-LD's main purpose is to resolve the ambiguity among naming schemes from our data sources by giving the data a context - mainly when obtaining data via JSON.
My goal for next week to try creating an API based web application.
We were assigned to complete a tutorial for completing an ontology for Pizza in Protege. Protege is an ontology development environment and is currently on version 5.0. The Pizza Ontology tutorial referenced version 4, so a lot of the steps did not apply to the new version. Instead of downgrading to an older version of protege, I decided to try figuring out how to complete the tutorial with the new layout. The tutorial covered creating a class hierarchy, disjointing classes, creating relationships between classes through object properties, creating an object property hierarchy, inverse properties, functional properties, transitive properties, symmetric properties, property domain/range, and property restrictions. Below is a table I put together that provides and overview of properties:
Dr. Bansal also asked us to begin researching for public data sources that would be useful for the real estate application will be be developing . Before researching, I put together a list of data sources I would like our application to utilize:
Now, locating those data sets online is the real challenge. I'm familiar with websites that are known to provide open data, such as data.gov. data.gov provides many data sets with their API's made publicly available. Another good open data site is freebase.com
Dr. Bansal provided us with a research article done by her and Sebastian Kagemann titled MOOCLink: Building and Utilizing Linked Data from Massive Open Courses. This article was regarding a research project they did to utilize the data from multiple MOOC websites such as Coursera, Udacity, and edX. The document began with providing a background highlighting the guidelines and resources used to build their project.
The section on web-crawling section explained how Coursera's data was gathered through their course catalog API. I have heard the term API getting used a lot both at school and in the web industry, but I had never understood exactly what it was or how it was used. I did a google search for Coursera's Catalog API and found the base URL's. Coursera uses JSON for their data exchange format and provides links to what looks to be a file containing many JSON objects that contain information regarding the courses on their site. This data is made public, so anyone can query their data.
I was also not familiar with web crawlers and have never heard the term screen scrapers, and I began researching on these topics. The amount of information on web-crawlers on the web is a bit overwhelming. Web crawlers can be written in many languages, and there are also numerous web scraping applications; this site contains a good list of apps with a description of it's use.
This paper gave me a better understanding of how we will approach this research project. I was interested in their use of schema.org to find the CreativeWork vocabulary and wanted to learn how to use schema.org's vocabularies, so I visited their site. There, I found a really good Getting Started tutorial that explained how to use their vocabularies. The tutorial covered how to implicitly markup a webpage using html5 tags referred to as MicroData. Along with going through the tutorial on schema.org's website, I also found a Google project on their codelabs site that involves an android application and how to use linked data to integrate with the voice commands of an Android device. A video series was released on youtube that explains the concept of linked data, and how to apply it to the project. The video series also covers an open source graph data base called Cayley that was inspired by Freebase and Google's Knowledge Graph. I learned that Google's Knowledge Graph is a technology that powers the boxes that appear in Google search results that provide direct answers to your search instead of a just a list of links. Cayley provides an interface that displays a graph representation of data that has been structured as triples. This collective information of different methods of using linked data made me aware of the fact that this technology applies to large systems and has virtually infinite uses.
I plan to write some test scripts and use the data I was able to gather and integrate it with the Cayley tool to create a basic knowledge graph.