view article

Figure 1
The data flow in our proposed method to measure similarity between materials, focusing on specific target physical properties and using the MapReduce representation language. The process consists of two subprocesses: (a) an exhaustive test for all predicting variable combinations, from which we can select the best combinations yielding the most likely regression models, and (b) a utilization of the regression-based clustering technique to search for partition models that can break down the data set into a set of separate smaller data sets, so that each target variable can be predicted by a different linear model. We can obtain a prediction model with higher predictive accuracy by taking an ensemble average of the models yielded in (a). We use the obtained partitioning models in (b) to construct a committee machine that votes for the similarity between materials.

IUCrJ
Volume 5| Part 6| November 2018| Pages 830-840
ISSN: 2052-2525