Wednesday, October 12, 2011

Attaching licenses and rights to linked data

At present, every data providers are thinking to break their data silos and publish the data into the web.


Increasing of web data cloud create new challenges. Vendors that consume linked Data must evaluate the quality and trust worthiness of the data. For example, A concept "Europe" from AGROVOC has been mapped with a concept "Europe" from the DBPedia. When a user opens html pages of Europe in the AGROVOC and see a mapping link of DBpedia and click the DBpedia link. It returns a html page for EUROPE that having an adult contents.

Users might get wrong impression about the Linked Open Data. For reason, a data quality such as accuracy, timeliness, reliability and trustworthiness are very important.


A common approach of Data quality is the analysis of provenance. It is one of the main factor that fluence the turst of users in the web.

Provenance has been widely used in art history and archival studies, to track the history of the owership of a valued object or work of art or literature. The study of provenance in computer science can be traced back to the work of view updates from the database communities since 1970 . In recent years, the study of provenance has attracted wide attention in the web contents due to LD. Further more, provenance information about a data items is information about the history of the item, starting from its creations, including information about its origins. Several terms have been used to name the tracing of the origin of data such as pedigree, lineage, audit and provenance.

We can divide the provenance based on granularity: Workflow and Data provenance. In our case, we consider the data provenance. A simple way, we can represent the provenance information by introducing some context information in the data. Data can be structure or non structure. To deal the non-structure data, we need to covert these data as structure data. One of the solution is make RDF (resource description framework) so that we can define a concept and URI from these non-structure data . Each concept has properties and attributes. By adding some properties with a concept, we can exchange the secure information. I would like to propose a concept based provenance for the AGROVOC Concepts. But, it can be followed for any applications.


Different vocabularies (such as DC, FOAF, SIOC, etc) use for describing the provenance information. None of them are appropriate. W3C people are working on this issue but there is no final solution. By using my hybrid method, we can easily solve the provenance issues for thesauri, raw data or structure data . Still, I am working on the model. . .I hope I will publish a new version soon -:).