Wednesday, October 12, 2011

Attaching licenses and rights to linked data

At present, every data providers are thinking to break their data silos and publish the data into the web.


Increasing of web data cloud create new challenges. Vendors that consume linked Data must evaluate the quality and trust worthiness of the data. For example, A concept "Europe" from AGROVOC has been mapped with a concept "Europe" from the DBPedia. When a user opens html pages of Europe in the AGROVOC and see a mapping link of DBpedia and click the DBpedia link. It returns a html page for EUROPE that having an adult contents.

Users might get wrong impression about the Linked Open Data. For reason, a data quality such as accuracy, timeliness, reliability and trustworthiness are very important.


A common approach of Data quality is the analysis of provenance. It is one of the main factor that fluence the turst of users in the web.

Provenance has been widely used in art history and archival studies, to track the history of the owership of a valued object or work of art or literature. The study of provenance in computer science can be traced back to the work of view updates from the database communities since 1970 . In recent years, the study of provenance has attracted wide attention in the web contents due to LD. Further more, provenance information about a data items is information about the history of the item, starting from its creations, including information about its origins. Several terms have been used to name the tracing of the origin of data such as pedigree, lineage, audit and provenance.

We can divide the provenance based on granularity: Workflow and Data provenance. In our case, we consider the data provenance. A simple way, we can represent the provenance information by introducing some context information in the data. Data can be structure or non structure. To deal the non-structure data, we need to covert these data as structure data. One of the solution is make RDF (resource description framework) so that we can define a concept and URI from these non-structure data . Each concept has properties and attributes. By adding some properties with a concept, we can exchange the secure information. I would like to propose a concept based provenance for the AGROVOC Concepts. But, it can be followed for any applications.


Different vocabularies (such as DC, FOAF, SIOC, etc) use for describing the provenance information. None of them are appropriate. W3C people are working on this issue but there is no final solution. By using my hybrid method, we can easily solve the provenance issues for thesauri, raw data or structure data . Still, I am working on the model. . .I hope I will publish a new version soon -:).













Saturday, October 8, 2011

AGROVOC LOD integrated into the LOD cloud diagram

In 19th september, LOD cloud group integrate AGROVOC LOD with their digram. You can see the all mapping links that are connected with AGROVOC.


The AGROVOC LOD crossed another milestone.

Saturday, May 28, 2011

The life cycle of AGROVOC LOD and My experiences


A very few people can see the future vision. One of them is our supervisor. It was his dream to publish the AGROVOC linked data. After my PhD, I went to Malaysia with the AGROVOC team where we decided to make AGROVOC owl to SKOS-XL. After a long discussion with Thomas Baker (an expert in SKOS) and Armando (my colleague from Rome), we started to map the owl properties for the SKOS version. It was a successful workshop at MIMOS, Malaysia. We all came back to Rome. Armando started to work on this conversion process since he was an expert on it. After a couple months, we were ready to run the conversion process. The conversion had been running for 15 days but we did not get any results. It was a night mare situation. After nearly one month, our server had crashed and we got no results.

Suddenly, Sachit turned off the reasoner and we got this conversion results by 1 hour. It was so amazing moment that we could not express ourself.

We had a SKOS file. we did not have any mappings with other thesauri. One day my supervisor returned from Luxembourg and he told me and Gudrun Johannsen "we have to do something". I took the words very seriously and started mapping between different resources. At the beginning we published 20,000 mapping links with the AGROVOC. At last the final day came to publish the LOD. You can find the news in here

http://aims.fao.org/news/agrovoc-thesaurus-released-linked-open-data

This is one of my big baby in my career after my PhD. I am very proud of the member of this LOD activities. Now, we are the largest and first Agriculture Linked Open Data in the World. ..

"When I close my eyes, i could see the AGROVOC links moving here and there" .......



Thursday, April 14, 2011

AGROVOC Thesaurus as Linked Open Data

We just announced the AGROVOC Thesaurus as Linked Open Data. You can find more information

http://aims.fao.org/news/agrovoc-thesaurus-released-linked-open-data


Soon, I am going to write something on it.

Thursday, March 17, 2011

A new face of thesauri

It was an amazing experience to work on making a new face of thesauri. In order to accomplish the work, we had to map 5 thesauri together. It was not so difficult for me to work on mapping activities, since I had an research experience on this field. It was difficult to understand the formats of different thesauri. I took decision that I would consider only concept URIs and its labels. It worked perfectly.

So, I parsed all the files and put into the database. Later on, I run my existing routine which was build on INRIA mapping API. I considered 8 eight element level matchers in order to perform these matching tasks. The results of all mappings had been verified by a domain expert . Here is an example,



The concept “Europe” from AGROVOC is mapped with NAL, EUROVOC, LCSH, GEMET and STW thesauri.

One of the goal is to make mother thesaurus as a reference vocabulary so that it can be used to retrieve the information, managing the information. Furthermore, it can be used also further mapping purposes taking into account as background knowledge. It is a new face of thesauri.

Wednesday, January 19, 2011

My working experiences in the mapping between thesauri

It was an amazing experience to work with large agriculture datasets in order to publish it into the Linked Open Data. It can be used as a background knowledge in the future since there is no universal background knowledge dataset in the world. Recently, I was refreshing my mind with mapping techniques and results after my PhD. I run my matcher between AGROVOC-EUROVOC and got a good number of exact matches (1,200). Again, I run the matcher between AGROVOC-GEMET and found 1,150 exact matches. But, I was extremely happy when I got 13,000 matches between AGROVOC and NAL. The idea is to put everything in one file which can provide services as an online shop for agriculture information in the world and connect all people in one platform of learning and browsing infromation. Today, I am writing just an abstract view of my personal opinion. Soon, I will post matching techniques and difficulties. -:).

Friday, January 7, 2011

The role of Information manager and Information flows at the moment

Even though the practice of metadata were changed from catalogue cards to machine-readable formats, for a long time metadata information is stored in repositories as electronic records. During the recent year, the World Wide Web is moving from the web of documents to the web of data. The metadata information is moving from merely machine-readable towards machine-processable, where it is essential to break the record and repository silos and make data (especially metadata) into machine understandable pieces. We present information about the data using the RDF (Resource Description Framework) which provides a data model for presenting metadata as machine-understandable and –processable triple statements (i.e., subject, object, predicate). For example, Adam (subject) is from (predicate) Peru(object). The subject of a triple is the URI (Uniform Resource Identifier) identifying the described resource, a predicate is the existing relationship between subject and object, and an object is a literal value or the URI of resources that is somehow related to the subject. In the above example, Adam is a person and Peru is a country. Although they represent totally different metadata, they are linked through the predicate (properties) and made it possible to gain the data and information from Adam to Peru such as the population, environment, currency, culture etc. of Peru. Furthermore, the information will be connected into the Linked Open Data (LOD) cloud where anybody can join, put their data and access the others information. This can be used by UN organizations, and play the role of one family for providing the services to the world. The linked data will have an impact on e-science, e-government, and e-agriculture.

The roles of information managers have changed due to this movement. The classical information mangers managed the information in order to be findable or searchable in the context of records in a small or large silo. Nowadays the information managers will need to manage the information in order to make it to be accessible, exchangeable, useable, reusable in context of data that are semantically linked, far beyond the boundary of repositories or silos, and far beyond bibliographic data. In my opinion, in the next five years more metadata and raw data will be liberated from silos and join the LOD cloud. The agriculture data will form its own cloud, interacting with other scientific data, geographical data, government data, and social society data. Here metadata is critical for the overall quality of their interaction and reuse. On the other hand, not every current Semantic Web technology has a clear implementation guideline; different models and tools are invented by implementers each month. This is very challenging because it requires information managers to be able to understand the whole landscape and master the technologies quickly and correctly. It is also the new role of the information managers to compare different approaches, identify good practices, and contribute to best practices through their own work

The year 2011 is a personalization of Web

The 2010 was the year of Linked Open Data. Most of the organizations introduce their data in the web as RDF format by breaking their old databases. I think, this year will be more agent based approach for the web. For example, I am in Peris and I would like to get more information about hotel, food, people, night clubs etc from the web. I was working the project "visit findland". It was a primary steps to make your own travel plan by using 2nd generation web technologies. I am expecting this work will continue further in this year and coming years and make people life more easy.