Term Weighting

Metode Pembobotan Kata Berbasis Sebaran untuk Temu Kembali Informasi Dokumen Bahasa Indonesia

Sari, Putri Dewi Purnama. 2012.

Term weight algorithm plays an important role in the process of document searching, which is greatly influenced by the precision and recall results of the Search Engine. Currently, TF-IDF term weight algorithm is widely applied in language models to build the search engine systems. Since term frequency is not the only discriminator which is necessary to be considered in term weighting and make each weight suitable to indicate the term’s importance, term weighting algorithm based on term distribution has been developed. In a single document, a term with higher frequency and closer to hypo-dispersion distribution usually contains more semantic information and should be given higher weight. One the other hand, in collection of documents, the term with higher frequency and hypo-dispersion distribution usually contains less information. This research implements term weight based on term distribution, with Local Term Weight Algorithm and Global Term Weight Algorithm for the documents in Indonesian Language. The result of this research is a Search Engine with an average precision of 84.8%.


Studi Komparatif Pembobotan Kata untuk Temu Kembali Informasi Dokumen Bahasa Indonesia

Anugrah, Hafizhia Dhikrul. 2013.

One of the classical models of Information Retrieval (IR) systems is the vector space model. Vector in this model represents the weight of terms contained in the documents and queries. A term can be a word, a phrase, or a unit in a document describing the context of the document. Since each term has a different level of importance in the document, weighting is needed. The commonly used weighting method is TF-IDF (Term Frequency Inverse Document Frequency). Previous research indicates that the distribution of term weighting follow the Poisson distribution, hence a method called RIDF (Residual Inverse Document Frequency) was developed. Other weighting methods by considering the query is called Query Term weighting. Both of the latter methods has not been implemented for documents in Indonesian. This research implements the methods TF-IDF, RIDF and Query Term Weighting on search engine for documents in Indonesian. The result of this research is a search engine with an average precision of 63,9%.


Mesin Pencari Dokumen Bahasa Indonesia Menggunakan Latent Semantic Indexing dengan Pembobotan Global

Handayani, Susi. 2012.

Current users tend to like search engine based on semantic of word. This is caused by the existence of synonymy and polysemy problems in the selection of the use of the word. One technique to resolve these issue is Latent Semantic Indexing (LSI). LSI has the ability to find relevant documents even if the word of the query are not written in the document. Currently, TF-IDF term weight algorithm is widely applied in search engines. Xia and Chai (2011) stated that, in a document collection, the term with higher frequency and hypo-dispersion distribution usually contains less information. The purpose of this research is to implement LSI using Singular Value Decomposition (SVD) method with term distribution based global term weight. This research used 1000 Indonesian agricultural documents. The performance of search engine using LSI with term-distribution-based global term weight gave highest average precision around 40.47%. The test result also showed that LSI with term-distribution-based global term weight gives better acuracy than LSI with TF-IDF.