Thursday, December 3, 2009

Agrovoc as Linking data

I was thinking for a long time to write something about linking data. Since I am working on Agrovoc thesaurus. How can we use it as linking data format. Before starting something, we need to clearify some questions.

  • What is linking data
  • What is the uses of linking data
  • How will agrovoc work as linking data.

Saturday, October 10, 2009

Ecoterm Meeting

It was great experience in Ecoterm workshop on 5th and 6th of october, at Fao, Rome. I got a chance to represent AIMS registry and Mapping projects. Beside this, I talked with several wornderful person. Gail was wonderful lady with vast knowledge. From the workshop, I came to learn that we should take initiatives now for building enviroment terminology for future. It is a big issue of climate change in the earth. We need universal theasurus for earth science ,geo science. I know it is difficult to build or difficult to maintains. We can keep our finger cross.

I did not satisfy about any mapping project. Nobody explained me clearly about their thoughts. Isaac asked me about prefLabel from SKOS. I have not found any ontology matching tool using SKOS files. In our case , we can use it by parsing skos file.


My idea was concept faced based matching, I was not so smart to explain it. But I feel that we can use it for matching purpose. Facet is a distinct feature of concept that contains hidden knowledge of a Concept. I am writing a paper on it now. Hoping that it will be published and people will get know about more thoughts of mine.

Tuesday, September 1, 2009

Meta data in KOS

I was thinking to write about Knowledge Organization System(KOS). Since I am working on the same area.

To start about KOS: We have to know about meta data: simply data about data
Meta data for KOS:

* Name /title
* Acronym
* Owner/Creator
* Language
* Type
* Format
* Note
* Usage
* E-Mail
* Date()
* Souces
* version ( we should keep version controlling so that we can walk through different version if we need sometime).
* Usage/subject cover/ purpose/rating
* Singnature(some time use for security reason).

Here is more details about KOS registry draft:

Ref: http://staff.oclc.org/~vizine/NKOS/Thesaurus_Registry_version3_rev.htm

Different forms of Knowledge Organization Systems (KOS) and their standards:


Dictionaries, glossaries
ISO 12200:1999, Computer applications in terminology--Machine Readable Terminology
Interchange Format (MARTIF)--Negotiated Interchange
ISO 12620:1999, Computer applications in terminology--Data Categories.

Thesauri
ISO 2788-1986(E) / ANSI/NISO Z39.19-1993(R1998) (www.niso.org)
ZThes (using Z39.50, strictly ANSI Z39.19)
http://www.loc.gov/z3950/agency/profiles/zthes-04.html)
Browser at http://muffin.indexdata.dk/zthes/tbrowse.zap
Vocabulary Markup Language (VocML) (under discussion at NKOS)
See also http://ceres.ca.gov/KOS/
ISO 5964-1985(E) (multilingual)
USMARC format for authority data
(http://lcweb.loc.gov/marc/authority/ecadhome.html)

Topic maps (reference works, encyclopedias) (http://www.topicmaps.org/about.html)
ISO/IEC 13250:2000 Topic Maps
XML Topic Maps (XTM) 1.0 (http://www.topicmaps.org/xtm/1.0/)
Concept maps

Classification schemes
USMARC format for classification data
http://lcweb.loc.gov/marc/classification/eccdhome.html

Ontologies
Knowledge Interchange Format (KIF) NCITS.T2/98-004
(http://meta2.stanford.edu/kif/dpans.html)
Ontology Markup Language (OML) /
Conceptual Knowledge Markup Language (CKML)
(http://www.ontologos.org/OML/CKML-Grammar.html)
Ontology Interface Layer (OIL) (http://www.ontoknowledge.org/oil/)

Generic standards for knowledge structures, entity-relationship models
Resource Description Framework (RDF) (http://www.w3.org/RDF/)
Metadata Coalition. Open Information Model (OIM). Knowledge Management Model
(http://www.mdcinfo.com/OIM/)
XTM might also fit here

Ref: Dagobert Soergel

If we take example of Thesaurus Registry for KOS: we can defined it in the following way:

* termId
* termName
* term Qualifier
* term Langauge
* term Created Date
* term Modified Date
* term Modified by
* Souce DB

Every group is having own KOS and their presentation system. This structure/standard totally depends on how and what purpose will you use this system.

It has a lot of blessings in vocabulary system or faceted system
But ,It can not currently be utilised to full petential because semantic structure not explicitly represented.

From my point of view , faceted analysis is big research issue now the days. I do believe that if we can present and adapt our ontology in faceted way, we can browse it easily.

I want to hand off my writing now and thinking lots of digital preservation of digital documents.

Monday, August 24, 2009

Preserving digital Memories

Today, I was talking with my friend Imma , she is library management specialist. I was asking her views regarding digital document preservation. She told me that text document are preserved as pdf and visibility is difficult if the file size is large.

This is a hot issue in the semantic web community, how can we preserve the information. For example , dynamic information is very difficult to preserve( satelight picture or audio information ).

This problem is not only semantic web domain but also it covers all the domain. For example, geographic information , natural pictures, videos etc.

The previously, people took picture by analog camera ;developed the pictures and preserve it in the album. But now, we take lots of pictures with digital camera; how many pictures we preserve it. From my experience, I have lost lot of pictures due to my hard disk crash. I will never get those pictures again. Since, I could not back to the time.

Also, people feel good when they remember childhood memory. But, some people can not remember all the things.

Once I asked one researcher from UK about this. Her vision was an electro magnetic chip that will help to remember things. Do we think , it is enough?

Ancient age, People wrote inforamtion in the stone. Professor Kurodo says"Archiving the mountains of digitalised culture heritage we have amassed for the future is paramount"

There are lots initiative taken now the days:
Recently yahoo announced that they will help digitise 18,000 works of American literature plus material from national and European archieves. That will include books, speeches, audio, video and music.

News channel , BBC also maintains archieve
Ref: http://www.bbc.co.uk/archive/

University of Trento and Trento city lunched new project "LiveMemory". The main theme of the project is to preserve the city information.

The main problem of preservation is format. There is no unique format for a picture , video, audio or text . Another problem is file size. There are some people offering to put your information or files in the online but I think it is not secured.


According to my views, we can do following things:
  • Everybody should agree about the unique format. Example W3 for RDF, OWL, XML
  • We should start the campaign about "Losing Information in Every Second"through the popular search engines.
  • Build an International forum for Digital Information(Pictures forum , Document forum etc)
  • Make a digital repository for every city.
Intially, It will be very expensive but we should not give up. I am sure that we will invent a small chip that will store a large quatity of information.



Sunday, August 23, 2009

My master thesis

Recently, I have got compliments from two persons about my master thesis. I was extremely happy to get it.

I had done my master thesis at KTH, Sweden with my friend Ramanjit Singh. He is very nice guy with a good sense of humor. We started our thesis under Prof. Paul Johannesson and Gudrun Jeppesen Neve. Our teachers was extremely nice and helpful to us. Our thesis was about "Evaluation Ontology Construction tools and Ranking techniques". Intially , we had plan to evaluate ontology construction tools at least 10 but we had not time. Specially me, I had got a PhD position at University of Trento, Italy. I told Prof. Paul about it. He inspired me about PhD and told me that you can do 3 tools evaluations. We worked hard and presented our thesis.

I had always fascination about my thesis. But , I was doing completing different things in my PhD studies. After 2 years ,I forgot about my thesis and the previous work .

When I got a letter from one researcher, UK and another e-mail from a semantic columist, USA. I was wornder and feel extremely good. I wish I could do more research on it. Remembering my beautiful days at KTH.

Saturday, August 22, 2009

Library Catalog System

A library catalog system keeps records for al bibliographic items .

Bibliographic items:
  • books
  • computer files
  • graphics
  • realia,
  • cartographic
Before librarian keep their records by using library cards. Still, we can find it some places. After coming to www, people are interested to see any information by internet (you can download or access pdf format of book so that you do not need to go library physically)

History:
AS far I know from history, library catalogues are introduced by in the house of wisdom. Then, there was a big collections of books of 7th and 8th centuray in Iraq during Islamic Renaissance. They used totally different catalogue syem in their library.

Later on , Hulagu khan attacked Iraq and distroyed and burnt all books.

Ref:http://liswiki.org/wiki/History_of_the_card_catalog

Here is some wonderful tools for cataloging
Ref : http://www.lib.berkeley.edu/Catalogs/

Holly Ramadan

Allah has given us lots of things, but we always forget to show our respect/ sacrifices to him. Today is our holly Ramadan. I am praying to God so that I can keep my fasting. Generally people think, you can not do work if you keep your fasting. I, myself believe that I do my best work in Ramadan. Anyway, Back to my thesis writing .

Friday, August 21, 2009

Semantic heterogeneity and factors

I was thinking to write about semantic heterogeneity for a long time. Its big problem for semantic matching purpose. Since my thesis on matching between two controlled vocabulary.

In short , I found some factors for heterogeneity problem:

  • Time (Vocabulary changes time-to-time)
  • place( after 50 miles , a new language starts, for example , italian langauge of trentino people is different than bolzano people or Distinct language in india for every states).
  • cultural diversity( English people say centre , American people say center).
  • structure of vocabulary (there is no unique presentation of vocabulary, for example some people use rdf files , some people use xml files)
Heterogeneity problem
  • Syntactic heterogeneity
  • Terminological heterogeneity(Paper vs Article)
  • Conceptual heterogeneity is also called semantic heterogeneity.
  • Semiotic heterogeneity

Sunday, August 16, 2009

Is social networking problem?

At night, this question came to my mind whether social networking are good or bad or time consuming? I was thinking to write something on it.


As we know that social networking sites( facebook, myspace, friends, h15) are growing popularity everyday. Is this popular only 15-20 aged people or 21-30 or 31-48 so on . I think that mostly people are aged 17-19 are very crazy for facebook. I had a small research on it. I asked a couple people and found out that they are basically using facebook for making new friend. Guys are poking girls and girls are also watching guys faces and physics. I asked myself , "Is it usful or not useful or time consuming". One sense it is useful , you can make good and new friend. On the hand , it is simply time consuming and u can use this time for other purpose. so it is giving any impact for young people with earlier age.


if you consider the group 21-35 then you can find out that most of people are using to communicate with their school or college or university friends or official colleaguge .I do not support those people who are using or browsing facebook at the office. if you spend 15 min or 20 min everyday on facebook then u can wast 1 h official time. It was new about one official person that he told his boss about sickness but boss saw him in facebook and got angry and fired him from job. This kind of cases are coming now the days. I am not saying you should not use it . I can say that you can use it but you should also respect your official time and work.

I think we should use social network site after official hours. or you can use it as learning or group meeting purpose.

but age group 36- 48 use social network site for finding their old school buddies. They are not so frequent on it. They are happy to see and communicated people.

However, we can make it useful :
1. Teacher can open a page for his course and make open discuss through it.
2. We can make meeting schedule among the group.
3. Publish some useful information so that people can get some knowledge.

Saturday, August 15, 2009

Pain and Research

Today, I got up from bed very late. I had a terrible teeth pain at the night. After taking my breakfast , I was writing something ontology matching techniques for my thesis. I want to share with it who will read my blog. Hope that they will pray for me for my thesis presentation.

Ontology matching techniques can be classified in the following :
  • Element Level Matching
  • Corpus-based Matching
  • Knowledge base Matching
  • Semantic Matching.
In Element Level Matching System, we start the process by comparing two strings. To compare strings, there are several methods are exiting.
  1. Prefix,
  2. suffix
  3. Edit distance
  4. N-gram
More interested readers , I can suggest that you should read the book of "Ontology Matching by Pavel"

In Corpus-based Matching, a large number of corpus are included. tokens are most important for this kind of matching . we can find matching using:

  • LSI(Latent Semantic Indexing)
  • Cluster Code Difference
  • Formal Concept Analysis
  • Common Instance Comparision
In knowledge-base matching, external resources are included. i.e. WordNet, Thesauri , Taxonomies, etc.

In semantic matching , match acts as operator and takes two graph and produces mapping but it depends on knowledge techniques as well . I think semantic matching techniques can not be accomplish without help of knowledge base. However, I can suggest to read a paper for semantic matching
"Semantic Matching" By Pavel Shvaiko and Fausto Giunchiglia

I think that matching is one of the hardest task. You can not achieve 100 % matching results by automatic matcher. I am not pessimistic person , I am optimistic . I am sure that we can overcome all this problem. It will be major break through for heterogeneity problem of data integration.




Tuesday, August 11, 2009

Different Kind of Controlled Vocabularies

I think that there are several kind of controlled vocabulary are exiting, e.g. , (thesauruses, Ontologies, Subject Schema, Catalogs). These vocabularies play an important role for information communications sytems, Library systems.

Thesaurus : It uses library science and knowledge organization systems

Subject Schema: It mainly uses in text categorization or document annotations.

Catalogs : Library science + Entertainment industry

All of these use for searching, information extraction etc.

In the past , mainly library science people used controlled vocabulary. After coming to Ontology, people understand need of controlled vocabularies for commercial or research purpose. There is major problem of universal controlled vocabularies. There is no universal Controlled Vocabularies for any specific domain. I must say that we need it very soon. Specially, Medical science , or Agriculture science or Other specific fields. I keep my finger cross for universal Controlled Vocabulary.

As I am researcher on this field, I looking forward a concrete research or volunteer work on it .

Monday, August 10, 2009

Controlled Vocabulary

There are several things come to mind when we hear about Controlled Vocabulary. In order to simplify my thoughts about controlled vocabularies , I planed to capture it in my blog:

  • What is Controlled Vocabulary
  • Why do we need them in our real life
  • How can you build Controlled Vocabulary
At the beginning, controlled vocabulary is one kind of database, ontology , thesaurus, Yellow Pages, classification schema etc. The simple definition of controlled vocabulary is a set of concepts and their relationship.

CV= Concepts + Relationship

For example, In flickr , if people tags their photos according to predefined keyword then it is easy to get the information. one person gives keyword "Trento" and put all the picture under the treno. Here Trento is working as controlled word.

There are many applications are exiting of Controlled vocabularies:

  • Information Extraction
  • Information browsing
  • Searching information
There are several approaches to build Controlled Vocabularies. I will define it next time.. going to run now.. ..:(

Sunday, August 9, 2009

Ontology Construction and Evaluation

I am writing on this topic because I am very much interested on it. At fist, we need to clarify some of questions:
  1. What is Ontology ?
  2. What is Ontology tool?
  3. Why we need to Evaluate it?
My point of view , Ontology is conceptual presentation that bears one scenario of a real world.
But famous definition "An explicit specification of a conceptualization".


There are lots of ontology construction tools around the world. The most famous tool is Protege from Standford University.

Ontology construction is very costly and time consuming. For example, If we want to capture all knowledge of one organization, the solution is ontology. Now, we need to check ontology construction tools so that we can design ontology with minimum cost.