Classification of crystal structure using a convolutional neural network

A deep-machine-learning technique based on a convolutional neural network (CNN) is introduced. It has been employed for the classification of crystal system, extinction group and space group for given powder X-ray diffraction patterns of inorganic materials.

The auto-peak-search was first carried out using proper peak profile parameters (peak threshold, background threshold and shoulder threshold) so as to exclude some of the weak intensity peaks of the Ca1.5Ba0.5Si5N6O3 system as shown in Figure S1 (a). The peaks so obtained were then indexed using TREOR program. The indexing result ended up with no acceptable solution, but few probable solutions with very low figure of merits. The alternative choice of the profile parameter in the auto-peak-search procedure was then made to account for weaker intensity peaks as shown in Figure S1 Table S1 and Table S2.The crystal structure of Ca1.5Ba0.5Si5N6O3 system was later indexed with monoclinic crystal structure in the Cm space group with a great deal of human intervention.

Figure S1
Peak positions of the XRD pattern obtained under different auto-peak-search parameters(peak threshold, background threshold and shoulder threshold) for the Ca1.5Ba0.5Si5N6O3 system.  S1: Output of indexing results using peak positions of the Ca1.5Ba0.5Si5N6O3 system shown in Figure S1 (  In conclusion, human intervention is initially required to identify each peak correctly (even the peaks with very weak intensities). After extensive analysis the final crystal structure was indexed with a monoclinic in the Cm space group having lattice parameters a = 7.07033(2) Å, b = 23.86709 (7)  S2. Indexing of the synchrotron X-ray powder diffraction data on BaAlSi4O3N5:Eu 2+ system Indexing of the peaks obtained through an auto-peak-search for the BaAlSi4O3N5:Eu 2+ system shown in Figure S2 using the TREOR program ended up with a probable structural solution for the crystal system as monoclinic. In addition to this there were other solutions with identical figure of merit in the output file. All the solutions obtained in the output file of the TREOR program is shown in Table S3. The suggested solution (monoclinic crystal system) was not found to be valid since the observed and calculated data do not show good match in the structural refinement. The crystal structure of BaAlSi4O3N5:Eu 2+ system was later indexed with orthorhombic system in the A21am space group, having the lattice parameters a = 9.48461(3) Å, b = 13.47194(6) Å, c = 5.77323(2) Å, and α = β = γ = 90° and   Comment: Among all the above five solutions given in the output file, the best solution for the crystal system was suggested as monoclinic (result no 5) by the TREOR program, even though it has 7 unindexed peaks. It should be noted that some of the unindexed peak in peak list do not belong to any impurity phase so its negligence in the peak indexing should not be ignored. The suggestion for the monoclinic system was perhaps based on the smallest cell volume obtained during indexing, in spite of the unacceptable figure of merit. However, further structural analysis using suggested parameters resulted in very bad match between the observed and experimental data, and it turned out to be an orthorhombic system.

S3. Basics of Artificial Neural Networks (ANNs) (Some of the text is taken from http://cs231n.github.io/)
Artificial neural network (ANN) is a machine learning approach that models human brain and consists of a number of artificial neurons (an information-processing unit that is fundamental to the operation of a neural network) that are linked together according to a specific network architecture. The objective of the neural network is to transform the inputs into meaningful outputs. ANNs are utilized in many scientific disciplines to solve a variety of problems such as pattern recognition, prediction, optimization etc.
Biological systems (humans/animals) are able to react adaptively to changes in their external and internal environment, and they use their nervous system to perform these behaviours. The basic computational unit of the brain is a neuron (about 86 billion neurons in human nervous system) are connected to synapses (10 14 -10 15 ). Each neuron receives input signals from its dendrites and produces output signals along its single axon. The axon eventually branches out and connects via synapses to dendrites of other neurons. A schematic drawing of a biological neuron and a common mathematical model) is shown in Figure S3. In the computational model of a neuron shown in Figure S3 (b), the neuron, input, output, and weight (w) are analogous to the cell body (also called soma), dendrite, axon and synapse respectively, to a biological system. The input signals that travel along the axons (e.g. x0) interact multiplicatively (e.g. w0x0) with the dendrites of the other neuron based on the synaptic strength at that synapse (e.g. w0). The idea is that the synaptic strengths (the weights w) are learnable and control the strength of influence (and its direction: excitory (positive weight) or inhibitory (negative weight)) of one neuron on another. In the basic model, the dendrites carry the signal to the cell body where they all get summed. If the final sum is above a certain threshold, the neuron can fire, sending a spike along its axon and frequency of the firing communicates information. Similarly, the neuron computes the weighted sum of the input signals and compares the result with a threshold value. If the net input is less than the threshold, the neuron output is -1. But if the net input is greater than or equal to the threshold, the neuron becomes activated and its output attains a value +1. Commonly used activation functions in neural networks are step function, Sigmoid function, σ(x)=1/(1+e −x ) (it takes a real-valued number and "squashes" it into range between 0 and 1), tanh (it squashes a real-valued number to the range [-1, 1] ), etc. The Rectified Linear Unit (ReLu) has become very popular in the last few years, it computes the function f(x)=max (0,x). In other words, the activation is simply thresholded at zero.
Neural Networks are modeled as collections of neurons that are connected in an acyclic graph. For regular neural networks, the most common layer type is the fully-connected layer in which neurons between two adjacent layers are fully pair wise connected, but neurons within a single layer share no connections. Figure S4 shows two example of Neural Network topologies that use a stack of fully-connected layers.

S4. Detailed description of the Convolutional Neural Networks (CNNs) [Some of the text is taken from http://cs231n.github.io/)]
Convolutional Neural Networks are very similar to an ordinary Neural Networks described in the previous section: they are made up of neurons that have learnable weights and biases. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity. The whole network still expresses a single differentiable score function: from the raw image pixels on one end to class scores at the other. And they still have a loss function (e.g. Support vector matrix (SVM)/Softmax) on the last (fully-connected) layer and all the tips/tricks that were developed for learning regular Neural Networks still apply. The only difference is that the CNN architectures make the explicit assumption that the inputs are images, which allows to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network. As describes in the previous section that a Neural Networks receive an input (a single vector), and transform it through a series of hidden layers. Each hidden layer was made up of a set of neurons, where each neuron was fully connected to all neurons in the previous layer, and where neurons in a single layer function completely independently and do not share any connections. The last fully-connected layer is called the "output layer" and in classification settings it represents the class scores. Regular Neural Nets could work well with images with small size. For example, images with size 32x32x3 (32 wide, 32 high, 3 color channels), a single fully-connected neuron in a first hidden layer of a regular Neural Network would have 32*32*3 = 3072 weights and could be manageable. However, an image with large size for example 200x200x3, would lead to neurons that have 200*200*3 = 120,000 weights. We certainly want to have several such neurons, so the parameters would add up quickly. Clearly, this full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting.
Convolutional Neural Networks on the other hand take advantage of the fact that the input consists of images and they constrain the architecture in a more sensible way. In particular, unlike a regular Neural Network, the layers of a CNN have neurons arranged in 3 dimensions: width, height, depth as shown in Figure S5. The word depth here refers to the third dimension of an activation volume, not to the depth of a full Neural Network, which can refer to the total number of layers in a network.

Figure S5(a) A regular 3-layer Neural Network, (b)
A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).

[Figures reproduced from http://cs231n.github.io/convolutional-networks/]
A simple CNN is a sequence of layers, and every layer of a CNN transforms one volume of activations to another through a differentiable function. Basically three main types of layers are used to build CNN architectures: Convolutional Layer, Pooling Layer, and Fully-Connected Layer (exactly as seen in regular Neural Networks). All these layers are stacked to form a full CNN architecture. Example of CNN for classification of image of a car is shown Figure S6. CNNs transform the original image layer by layer from the original pixel values to the final class scores. The parameters in the CONV/FC layers will be trained with gradient descent so that the class scores that the CNN computes are consistent with the labels in the training set for each image.

Example: A simple CNN for image classification could have the architecture [INPUT -CONV -RELU -POOL -FC].
 INPUT [ e.g 32x32x3 image size ] will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.  CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as [32x32x12] if we decided to use 12 filters.  RELU layer will apply an element wise activation function, such as the max(0,x) thresholding at zero. This leaves the size of the volume unchanged ([32x32x12]).
 POOL layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as [16x16x12].
 FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size with class score.

Figure S6
The initial volume stores the raw image pixels (left) and the last volume stores the class scores (right). Each volume of activations along the processing path is shown as a column.
Since it's difficult to visualize 3D volumes, each volume's slices is laid out in rows. The last layer volume holds the scores for each class for the sorted top 5 scores [Figures reproduced from http://cs231n.github.io/convolutional-networks/].

S5. Training methods
Both supervised and supervised process may be adopted for the training. In the supervised training, both the inputs and the outputs are provided. The network then processes the inputs and compares its resulting outputs against the desired outputs. Errors (the difference of the desired output and the given output) are then propagated back through the system, causing the system to adjust the weights which control the network. This process occurs over and over as the weights are continually tweaked. The set of data which enables the training is called the training set. During the training of a network the same set of data is processed many times as the connection weights are ever refined. Whereas, in unsupervised training, the network is provided with inputs but not with desired outputs. The network is required to selforganise (i.e. to teach itself) depending on some structure in the input data.This is often referred to as self-organization or adaption.