Thursday, March 8, 2012

Free stemming API

Assalamualaikum / Hello / Hola / Bonjour,

It's been a while since my last article here, today i'm coming with something really interesting and very useful, i was looking for in while, an api that regroup many stemming algorithms , it support all languages and you can use it just using curl. you give it the word the language an the algorithm and you get the step like magic, and what's very cool is that all algorithms are open source  wow that's really cool! yea you can download it and change it if necessary than use it, isn't cool?
What?
Well if you don't know what the stemming is let me give a short definition: It's the process of reducing a word to it's origin(base) word called stem, it's used in search engine and dictionary.

'play', 'playing' and  'plays' for example have one common base word witch is  'play': more details are http://en.wikipedia.org/wiki/Stemming

You can find the API in this link : http://text-processing.com/docs/stem.html
and here is an example of using it in php:
ps: Of course you need to have curl installed

<?php
function get_stem_word($parameters) {
    $curl = curl_init();

    $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";

    curl_setopt($curl, CURLOPT_URL, 'http://text-processing.com/api/stem/');
    curl_setopt($curl, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)');
    curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
    curl_setopt($curl, CURLOPT_REFERER, 'http://www.google.com');
    curl_setopt($curl, CURLOPT_POST, 1);
    curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($parameters));
    curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
    curl_setopt($curl, CURLOPT_AUTOREFERER, true);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_TIMEOUT, 10);

    $result = curl_exec($curl); // execute the curl command
    curl_close($curl); // close the curl connection

    return json_decode($result);
}

$stem_object = get_stem_word(array('text'=>'playing', 'language'=>'english','stemmer'=>'porter'));
$stem = $stem_object->text;
echo $stem;

Parameters:
    text : The word you want it's stem, required;
    language: The laguage of the word, Optional, The default value is english;
    stemmer: the stemming algorithm, Optional,  The default value is porter.

Bounmed,