Ad verba per numeros


Thursday, January 17, 2013, 05:31 PM
Warning: quick and dirty post.

Lexicon matching is by no means the most accurate way of performing sentiment analysis; however, it is one of the easiest ways of implementing a quick prototype.

Needless to say, we need a lexicon to do that and they tend to be scarce, mostly limited to the English language, and small.

I've just heard of the corpus by Warriner et al. with almost 14,000 English words and I've tried to prepare a quick "translation" into Spanish.

The method I've applied is extremely crude:

  1. Translate the list of words from English into Spanish.
  2. Translate (again) the list of Spanish words into English.
  3. Check the original English word and the English-to-Spanish-to-English word are the same.
In addition to that there has been some manual checking but, as I say, it is pretty crude and is provided as is.

So, here you are, the Warriner et al. corpus machine-translated into Spanish with a little more than 9,000 words. Enjoy!

As usual you can find me at PFCdgayo for any comment regarding this post.



Back Next