Abstract:
Medical data is an ever-growing source of information from hospitals in form of patient records. When mined, the
information hidden in these records is a huge resource bank for medical research. This data contains hidden
patterns and relationships, which can lead to better diagnosis. Unfortunately, discovery of these patterns and
relationships often goes unexploited. Studies have been carried out in medical diagnosis to predict heart diseases,
lungs diseases, and various tumors based on the past data collected from patients. However, they are mostly
limited to domain-specific systems that predict diseases restricted to their area of operations. In retrospect, the
performance of the k-nearest neighborhoods (k-NN) classifier is highly dependent on the distance metric used to
identify the k nearest neighbors of the query points. The standard Euclidean distance is commonly used in practice.
This study uses vast storage of information so that diagnosis based on historical data can be made. It focuses on
computing the probability of occurrence of a particular ailment by using a unique algorithm. This k-NN algorithm
increases the accuracy of such diagnosis. The algorithm can be used to enhance the automated diagnoses, which
include diagnosis of multiple diseases showing similar symptoms. To validate the experimental results, a
hypothesis was tested for the following variables: accidents, age, allergies, blood pressure, smoking habit, total
cholesterol, diabetes and hypertension, family history of heart disease, obesity, and lack of physical activity. It was
evident that there was a strong relationship between the above variables to the causes of common chronic
diseases like: heart ailment, diabetes and cancer.