Sastrawi Bindings for Ruby Build Status Gem Version

sastrawi-ruby is Ruby bindings for Sastrawi, a library which allows you to stem words in Bahasa Indonesia. The original implementation of Sastrawi was written in PHP and this library is written in Ruby language.

Taken from Wikipedia, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form. For instance, “menahan” has “tahan” as its base form. If you want to know how stemming works, please read this page (in Bahasa Indonesia) for further details.

Demo

The demo version of sastrawi-ruby can be accessed here.

Documentation

Documentation for this library is available on here. You can also check sastrawi-ruby GitHub Wiki that contains TODO list.

Installation

There are two options to install this library. First, if you just want to use Ruby bindings for Sastrawi, add this line to your application’s Gemfile:

gem 'sastrawi'

and then execute:

$ bundle install

or you can install directly:

$ gem install sastrawi

Note that, this library requires Ruby. Ruby 2.3 series or above should be installed on your system. I would recommend to use the stable versions.

Usage

This library supports stemming words with provided base forms.

require 'sastrawi'

# create stemmer
stemmer_factory = Sastrawi::Stemmer::StemmerFactory.new
stemmer = stemmer_factory.create_stemmer

# prepare a sentence or words to be stemmed and call the stem API
sentence = 'Perekonomian Indonesia sedang dalam pertumbuhan yang membanggakan.'
stemming_result = stemmer.stem(sentence)

# the stemming result should be "ekonomi indonesia sedang dalam tumbuh yang bangga"
puts stemming_result

Beside that, you can add or remove any base form.

require 'sastrawi'

# create stemmer
stemmer_factory = Sastrawi::Stemmer::StemmerFactory.new

# create default dictionary and add a text file that contains words into it
dictionary = stemmer_factory.create_default_dictionary
dictionary.add_words_from_text_file('my-dictionary.txt')

# add or remove words
dictionary.add('internet')
dictionary.remove('desa')

# stem a word, "internetan", for example
stemmer = Sastrawi::Stemmer::Stemmer.new(dictionary)

# the stemming result should be "internet"
puts stemmer.stem('internetan')

Contributing

Contributions are welcome. Please, read CONTRIBUTING guidelines.

License

This library is released under the terms of MIT License. See the LICENSE file for more details. sastrawi-ruby contains base form of words from Kateglo and it is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.