It's been a while since my last article here, today i'm coming with something really interesting and very useful, i was looking for in while, an api that regroup many stemming algorithms , it support all languages and you can use it just using curl. you give it the word the language an the algorithm and you get the step like magic, and what's very cool is that all algorithms are open source wow that's really cool! yea you can download it and change it if necessary than use it, isn't cool?
What?
Well if you don't know what the stemming is let me give a short definition: It's the process of reducing a word to it's origin(base) word called stem, it's used in search engine and dictionary.
'play', 'playing' and 'plays' for example have one common base word witch is 'play': more details are http://en.wikipedia.org/wiki/Stemming
You can find the API in this link : http://text-processing.com/docs/stem.html
and here is an example of using it in php:
ps: Of course you need to have curl installed
<?php
function get_stem_word($parameters) {
$curl = curl_init();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
curl_setopt($curl, CURLOPT_URL, 'http://text-processing.com/api/stem/');
curl_setopt($curl, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)');
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_REFERER, 'http://www.google.com');
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($parameters));
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
$result = curl_exec($curl); // execute the curl command
curl_close($curl); // close the curl connection
return json_decode($result);
}
$stem_object = get_stem_word(array('text'=>'playing', 'language'=>'english','stemmer'=>'porter'));
$stem = $stem_object->text;
echo $stem;
Parameters:
text : The word you want it's stem, required;
language: The laguage of the word, Optional, The default value is english;
stemmer: the stemming algorithm, Optional, The default value is porter.
Bounmed,