Figure 3
Network architecture. We use a pre-activation ResNet-inspired architecture. It takes patches of size 192 × 96 as input and processes them in a sequence of eight pre-activation residual blocks. Downsampling is implemented via strided convolution. The architecture is initialized with 16 filters and doubles the number of filters with each downsampling operation up to a maximum of 256. Global average pooling reduces the final feature representation (shape 6 × 6) to a vector that is then used by the classification layer to distinguish single from non-single hits. The size of the feature representations is indicated above each residual block. 16 × 192 × 96 here denotes 16 convolutional filters with a feature representation of size 192 × 96. |