Universidad Politécnica de Madrid Universidad Politécnica de Madrid

Big boost for Spanish Semantic Wikipedia

The Spanish edition of DBpedia mushroomed from four to over 100 types of mapped infoboxes and from 20,000 to over 400,000 mapped pages.

On 3 and 4 November, Oscar Corcho, from the Ontology Engineering Group of the Facultad de Informática (UPM) and leader of the Spanish Linked Data Thematic Network, organized a Spanish Wikipedia mapping description creation marathon to generate data for the Spanish edition of DBpedia. This marathon was also organized by Mariano Rico, Spanish language leader on the DBpedia internationalization committee.

This activity is part of the world "language race" to create different language editions of DBpedia. Fifteen people from public and private institutions (Universidad Politécnica de Madrid, iSOCO, Universidad Autónoma de Madrid) as well as private persons participated in the event.

The marathon was an out-and-out success, resulting in an unbelievable increase in the information available in the Spanish edition of DBpedia. The types of mapped infoboxes have grown from just four to over 100, and the number of mapped pages from 20,000 to over 400,000.

DBpedia

DBpedia is a project for extracting data from Wikipedia and building a semantic version of this Internet encyclopedia. It is a community effort at extracting structured information from Wikipedia and making this information accessible on the Web. The harvested knowledge can then be used computationally.

DBpedia is a gigantic structured database built from information entered by people from all over the world in what are termed templates or infoboxes on many Wikipedia pages. These infoboxes are displayed as a boxed item on the right of Wikipedia pages. For example, the infobox on the Spanish web page for the city of Madrid contains the flag, coat of arms, photos of monuments and sites, and information on the population, local government, post codes, etc.

Spanish data growth

The volume of Spanish data in the DBpedia has climbed nine (out of 15) rungs of the mappings ladder, and Spanish is now one of the top three languages, occupying a position equivalent to its Wikipedia ranking (in terms of number of entries).

Thanks to this initiative, the entire Spanish-speaking community will benefit from this gigantic database in applications such as:

- Sem4Tags, a tool that identifies what DBpedia resource is tagged by users in social media like Flickr, Youtube, Facebook, etc.

- DBpedia Spotlight, developed in partnership with members of the FUB, whose Spanish version is in the pipeline.
- Identifying topics in tweets.

- Teaching scientific disciplines, where students can extract definitions from this database for their models of ecological, environmental systems, etc.

Source: FIUPM