Convex Hull (also known as SBQI, Software Build Quality Inspector) is a tool for finding novelties, anomalies and outliers in the data using unsupervised and semi-supervised paradigms. An example of a novelty is an unfamiliar face trying to access a configured face recognition system. An anomaly might be abnormal data produced by a machinery that is about to break. An outlier might be a credit card fraud in data gathered from monetary transactions.
Learning paradigms are usually classified into supervised, semi-supervised and unsupervised learning based on the availability of feedback or labels. In supervised learning, each observation in the training data must have a label. This means that an expert must evaluate each measurement in the training data. This process can be both time consuming and costly. In addition, supervised learning assumes that the data of the negative class is well sampled. For example, data of failing machinery is hard to come by. Likewise, it is hard to cover all the ways in which credit card information may be misused.
SBQI uses both unsupervised and semi-supervised learning to ease the assumptions of supervised learning. In unsupervised learning, there are no labels at all. In semi-supervised learning, there are some observations with labels and data without labels. Some unsupervised models are also able to handle those classification tasks in which examples of negative class are scarce.
SBQI utilises one-class classification and positive-unlabelled learning methods. One-class classification is learning from positive data (normal operation) with no known negative examples (novelties, anomalies or outliers). This means that SBQI is, for example, able to learn the limits of normally operating machinery and give a warning of failing machinery when the limits are reached. Positive-unlabelled learning is learning from unlabelled data from both classes with only a few known positive examples. This way positive-unlabelled learning can find the pictures of a certain person’s face from a database of pictures based on a few example pictures of that person.
The software design of SBQI is based on modular architecture. The modules are each designed to handle one task such as data reading, data preparation, classification and viewing the results. A full program is a combination of suitable modules that interact with each other. The modular architecture provides flexibility so that SBQI is able to solve a wide variety of classification problems. Besides the pre-installed modules, the user may also easily create new modules for SBQI depending on the need.
SBQI is written in Python. Python was chosen because it has extensive numerical computing and machine learning libraries (Numerical Python, Scientific Python and Scikit-learn). Python is also comprehensible and fast to prototype. SBQI can execute modules written in compiled languages, such as C, through a subprocess interface. This shortens the execution times for computationally expensive tasks.