Question Answering System

Implementasi Question Answering System Dengan Metode Rule-Based Pada Terjemahan Al Qur’an Surat Al Baqarah

Anggraeny, Meynar Dwi. 2007.

Proses penemukembalian jawaban dimulai dengan memecah (parsing) suatu dokumen menjadi kalimat-kalimat. Kalimat-kalimat tersebut dipecah dan distem menjadi token-token.Begitu pula dengan kalimat pertanyaan pada kueri dipecah dan di-stem menjadi token-token. Token-token dari setiap kalimat dokumen maupun kalimat kueri diproses dalam rule sesuai dengan tipe pertanyaannya. Dalam penelitian ini digunakan lima tipe pertanyaan, yaitu: APA, SIAPA, KAPAN, MANA, dan MENGAPA. Proses di dalam rule itu memberikan nilai untuk masing-masing kalimat dokumen. Kalimat yang memiliki nilai tertinggi akan dikembalikan sebagai kalimat jawaban. Semakin besar hasil persentase jumlah kalimat relevan yang ditemukembalikan terhadap jumlah kalimat yang ditemukembalikan, maka kinerja sistem akan semakin baik. Kalimat jawaban yang dikembalikan bisa lebih dari satu karena ada kemungkinan beberapa kalimat yang memiliki nilai sama tingginya. Dari evaluasi sistem, rule pada tipe pertanyaan ”SIAPA” mempunyai kinerja yang paling tinggi dan rule pada tipe pertanyaan ”MANA” mempunyai kinerja yang paling rendah. Secara keseluruhan, akurasi rata-rata rule terhadap kueri dalam penelitian adalah 85.69%, sedangkan akurasi rata-rata rule terhadap kueri yang diberikan pengguna umum adalah 53.14%.


Implementasi Question Answering System dengan Metode Rule-Based pada Banyak Dokumen Berbahasa Indonesia

Sianturi, Romaida Dolarosa. 2008.

Question Answering System (QAS) menggunakan metode rule-based dapat diterapkan untuk membangun sistem temu kembali jawaban atas pertanyaan dari banyak dokumen berbahasa Indonesia. Dengan memasukkan kueri berupa pertanyaan ke dalam sistem, maka sistem akan mengembalikan suatu kalimat sebagai jawabannya. Proses penemukembalian jawaban dimulai dengan memecah (parsing) suatu dokumen menjadi kalimat-kalimat. Kalimat-kalimat tersebut dipecah dan di-stem menjadi token-token. Begitu pula dengan kalimat pertanyaan pada kueri dipecah dan di-stem menjadi token-token. Token-token dari setiap kalimat dokumen maupun kalimat kueri diproses dalam rule sesuai dengan tipe pertanyaannya. Dalam penelitian ini digunakan lima tipe pertanyaan, yaitu: APA, SIAPA, KAPAN, MANA, dan MENGAPA. Proses di dalam rule itu memberikan nilai untuk masing-masing kalimat dokumen. Kalimat yang memiliki nilai tertinggi akan dikembalikan sebagai kalimat jawaban. Semakin besar hasil persentase jumlah kalimat relevan yang ditemukembalikan terhadap jumlah kalimat yang ditemukembalikan, maka kinerja sistem akan semakin baik. Kalimat jawaban yang dikembalikan bisa lebih dari satu karena ada kemungkinan beberapa kalimat yang memiliki nilai yang sama.


Analysis of Question in Indonesian Language in Question Answering System (QAS)

Kartina. 2010.

Question analysis is the first step in QAS. This step decides the final result of QAS process because the result of question analysis is used to retrieve relevant document and answer entity correctly. The used question query is limited to question type: WHO, WHERE, WHEN, and HOW MANY or HOW MUCH. The question word on query is used to obtain the answer candidate, while other words beside the question word are used to analyze the question. Question analysis process is started by parsing the keyword become tokens. The parsing is conducted with regarding token’s pair possibility as a phrase. This phrase formation is expected to be able to keep the semantic aspect of question sentence. The question sentence that has parsed is used to retrieve document and top passage. Top passage is obtained through heuristic scoring. The answer extraction is conducted by calculating the nearest distance between each answer candidate in top passage and each word in keyword. Answer correction is evaluated by using these criteria: right, unsupported, wrong, and null. The evaluation result of system showed the more correct answer for the less number of documents. The result of 106 tested documents was 75.56 % for criteria right, 2.22 % for criteria unsupported, 17.78 % for criteria wrong, and 4.44 % for criteria null. The result of 200 tested documents was 73.33 % for criteria right, 22.22 % for criteria wrong, and 4.44 % for criteria null. The result of 300 tested documents was 71.11 % for criteria right, 22.22 % for criteria wrong, and 6.67 % for criteria null. This decreasing of percentage is caused by the less correct of passage scoring method that was used.


Implementasi Question Answering System dengan Metode Rule-Based untuk Temu Kembali Informasi Berbahasa Indonesia

Ikhsani, Nafi. 2011.

Question Answering System (QAS) dapat diterapkan untuk membangun sistem temu kembali jawaban atas pertanyaan dalam suatu bacaan (reading comprehension). Dengan memasukkan kueri berupa pertanyaan ke dalam sistem, maka sistem akan mengembalikan sebuah kalimat sebagai jawabannya. Proses penemukembalian jawaban dimulai dengan memecah (parsing) suatu dokumen bacaan menjadi kalimat-kalimat. Kalimat-kalimat tersebut dipecah dan di-stem menjadi token-token. Begitu pula dengan kalimat pertanyaan pada kueri dipecah dan di-stem menjadi token-token. Token-token dari setiap kalimat dokumen maupun kueri diproses dalam rules sesuai dengan tipe pertanyaannya. Dalam penelitian ini tipe pertanyaan yang digunakan hanya lima tipe, yaitu: APA, SIAPA, KAPAN, MANA, dan MENGAPA. Proses di dalam rules itu memberikan nilai (score) untuk masing-masing kalimat dokumen. Kalimat yang memiliki nilai tinggi akan dikembalikan sebagai jawaban. Kalimat yang dikembalikan sebagai jawaban bisa lebih dari satu, karena ada kemungkinan beberapa kalimat yang memiliki nilai yang sama tingginya. Banyaknya kalimat yang ditemukembalikan juga bergantung pada ambang batas nilai (threshold of score) yang digunakan. Threshold of score yang digunakan dalam penelitian ini adalah 1 sampai 12. Kinerja sistem tertinggi dicapai saat menggunakan ambang batas 7 dan 8, yang mengembalikan rata-rata tiga kalimat dan banyaknya hasil yang benar mencapai 82,5%. Dari evaluasi berdasarkan rules, rules APA mempunyai kinerja yang paling tinggi, dan akurasi rata-rata rules adalah 74,65%. Namun, akurasi yang cukup tinggi yang dapat dicapai sistem ini hanya berlaku untuk penelitian ini saja dengan berbagai asumsi yang digunakan.


Implementation of Question Answering System for Document in Bahasa Indonesia with List Question

Umriadi, Agus. 2011.

In the last few years, many studies of Question Answering System (QAS) have been conducted by a number of research groups around the world. Lately, a question is not only presented in the form of factoid questions, but also as a list questions where a question requires more than a single-entity of answer. However, recent development on QAS can only accommodate factoid questions which only require a single-entity’s answer. To address this issue, the purpose of this research is to implement QAS for list questions. In order to obtain candidate of answers, heuristic weighting is performed in the passage which is contained on the top n documents. One thousand documents and 40 queries are used in the experiment. The best results of experiment show correctness of 26%, 39%, 36.33% and 70 % for “who”, “how many/much”, “where” and “when” list questions, respectively.


Pemilihan Passages dalam Question Answering System untuk Dokumen Berbahasa Indonesia

Sanur, Suci Armelia. 2011.

The first step on Question Answering System was the user enter question query. The used question query is limited to question type: WHO, WHERE, WHEN, and HOW MANY or HOW MUCH. The question word on query is used to obtain an answer candidate, while other words beside the question word are used to analyze the question. Question analysis process is started by parsing into keyword become tokens. The question sentence that has parsed is used to retrieve document and top passage. Top passage is obtained of question from passages that has highest point. Passages was done by three scoring method : rule-based, heuristic, and combination of rule-based with heuristic. The answer extraction is conducted by calculating the nearest distance between each answer candidate in top passage and each word in keyword. Answer correction is evaluated by using these criteria: right, unsupported, wrong, and null.


Implementasi Question Answering System pada Dokumen Bahasa Indonesia Menggunakan Metode N-Gram

Rahmawan, Fandi. 2011.

Recent development on Question Answering System (QAS) can accommodate language modeling mechanism for generating a better result. To develop the system, we use Indri Framework toolkit in order to obtain robust document structure and text passages. Documents relevant to a question are first retrieved. The relevant documents are then divided into passages of 2 sentences each. In order to obtain candidate of answers, n-gram weighting is performed in the passage which is contained on the top document. The answer is identified based on matching annotations between the query and the document. One thousand documents and 40 queries are used in the experiment. The result of the experiment indicates 60% of correctness for who questions, 30% of correctness for how much/many questions, 90% of correctness for where question and 80% of correctness for when questions. Our results shows that we need to find ways to improve the effectiveness in finding correct answers, in particular, ways of reducing the number incorrect word tagging.


Cross Language Question Answeing System Menggunakan Pembobotan Heuristic dan Rule Based

Subu, Selamet. 2012.
Cross Language Question Answering (CL-QAS) means that the question is expressed in another language than that in which documents from which the answer is extracted are written. The challenge therefore was to identify answers to a Indonesian question in a collection of English documents. The focus of the evaluation was on finding answers, so translation of the answer from English into Indonesian was not required. The first step on CL-QAS was the user enter question query. The used question query is limited to question type: WHO, WHERE, WHEN, and HOW MANY or HOW MUCH. The question word on query is used to obtain an answer candidate, while other words beside the question word are used to analyze the question. Parsing the query will be done for separating between question words and question sentences or keyword. This keyword will be used for feedback the document and top passage. The passage will score using heuristic and rule based method. The candidate answer then will extracted from the passage which have the highest score. The candidate answer which have the nearest distance average with the keyword will be returned as an answer for the query. Evaluation for the answer is based on four criteria: right,unsupported,wrong, and null. Experiment is do by comparing the result of passage heuristic and rule based scoring. Comparing is do by getting the result of top passage and the correct answer from those two kind of passage scoring. The heuristic scoring produced the highest percentage for right criteria 92.5%, unsupported criteria 0%, wrong criteria 7.5% and null criteria 0% whereas rule based producing for right criteria 90%, unsupported criteria 0%, wrong criteria 10%, and null criteria 0 %.


Temporal Question Answering System Bahasa Indonesia

Darliansyah, Adi. 2012.

Time is an important dimension in information retrieval. Temporal expressions describe time information embedded in the documents. Therefore, extraction and normalization of temporal expressions from documents are crucial. In this research, a question answering system is implemented for temporal information processing from documents in Indonesian language based on four types of temporal question beginning with question words such as siapa (what), kapan (when), di mana (where), and berapa (how many). Implicit time references in document are first normalized and tagged manually into explicit time references. Complex temporal question is divided into simpler questions by using temporal signal detection for specific sequence of events. In order to obtain answer candidates, heuristic weighting is performed on the top passages. Answer extraction is performed using the smallest distance between query and answer candidates. A corpus containing 100 documents and 80 queries is used in this research. Answer evaluation is based on three criteria, namely, Right, Wrong, and Unsupported. The questions are used to evaluate the results of BM25 and Proximity ranking modes. The evaluation for simple temporal questions (Type 1 and 2) using BM25 and Proximity gave the same results at 85% Right answers for Type 1 and 75% for Type 2. The results for complex temporal questions (Type 3 and 4) indicated good performance. The best results were obtained by BM25 at 95% Right answers for Type 3 and 75% for Type 4, while using Proximity resulted in 85% Right answers for Type 3 and 80% for Type 4. We also used our corpus on a nontemporal question answering system by Umriadi in 2011. The results are 60%, 55%, 60%, and 40% Right answers for Type 1, 2, 3, and 4, respectively, much lower than our temporal question answering system. Therefore, temporal expression extraction and temporal signal identification are particularly important for handling questions containing temporal information. Our system is able to identify and answer the temporal questions in Indonesian language.


Pembentukan Passage dalam Question Answering System untuk Dokumen Bahasa Indonesia

Fathi, Syahrul. 2012.

Passages are used by question answering system to get pieces of relevant documents. This research compared various aspects of passages: overlapping and non-overlapping passages, sentence based and word-based passages, and passage formation time (before and after indexing). Types of question in this research are siapa (who), di mana (where), kapan (when), and berapa (how many). For indexing and retrieval process, we used BM25 and proximity algorithms from Sphinx. Top documents or passages were re-weighted using rules to get passages containing answers candidate. Answer extraction was performed using the smallest distance between query and candidate answers. Evaluation was conducted using mean reciprocal rank and answer accuracy (four criteria: Right, Unsupported, Wrong, and Null). The best result was obtained using BM25 for two kinds of passage, namely, 20 overlapping words with 80% accuracy and 30 overlapping words with 77.5% accuracy, where both considered one tag as one word and were formed after indexing. The best result for proximity were obtained three kinds of passages, namely, 2 overlapping sentences with 77.5% accuracy, 2 non-overlapping sentences with 77.5% accuracy, and 20 overlapping words with 77.5% accuracy, they also considered one tag as one word and were formed after indexing. The average performance based on mean reciprocal rank for passage by using BM25 and Proximity are 75.1% and 76.1%, respectively. The passages formed after indexing have better accuracy which indicates retrieving relevant documents is important for question answering system.


Question Answering System Menggunakan N-Gram Term Weight Model

Bahri, Debby Puspa. 2013.
Currently, search engine has been widely developed having question query feature known as the query answering system. The information provided by the system must fit a specific user requirement. This research will apply the passage selection method using n-gram term weighting model. The evaluation of the method is measured based on the set of questions and documents, and the accuracy for each answer. One thousand documents and 40 queries are used in this research. The result of the research indicates the accuracy for WHO questions is 90%, for WHEN questions is 80%, for WHERE questions is 80%, and for HOW MUCH/MANY questions is 40%.


Cross Language Question Answering System Menggunakan Pembobotan Heuristic dan Multidokumen

Mulyanto, Fadila Andre. 2013.
People tend to ask when they need to get some information. This often raises a difficulty whenever available information is not in same the language with the person understands or speaks. Cross Language Question Answering System (CL-QAS) is an information retrieval system that is able to handle this kind of situation. It accepts a question query as the input and outputs the answer in the translated language. In this study, CL-QAS is developed that takes query in Indonesian language and answers in English. The system output is calculated by weighting heuristic and multi-documents. The average time to produce answer is quite fast, i.e. 3.03 seconds. The system accuracy is good considering for the following queries: SIAPA (100%), KAPAN (100%), DIMANA (100%), and BERAPA (90%).