When an isomorphous difference Patterson is calculated, GraphEnt will plot
the normal probability diagram of the input data, together with a
reference dotted line of gradient 1.0 and zero intercept7.
The usage of the normal probability plots for accessing the usefulness (or otherwise) of
a putative derivative is well documented and will not be discussed here
(see Howell, P.L. & Smith, G.D. (1992),
J. Appl. Cryst., 25, 81-86, and Abrahams, S.C. & Keve, E.T. (1971),
Acta Cryst., A 27, 157-165). If you scaled your (macromolecular)
data using the program
scaleit from the CCP4 suite, then although you have not seen the plot, you have seen
the variation of its gradient and intercept versus resolution (using the program
xloggraph on the .log file written by scaleit).
The reason for repeating the calculation here, is that the normal probability
plot can also be used to select suspect data that do not fit an otherwise linear trend.
The important thing is that the selection is not performed on the basis of just the magnitude
of the difference (ie
|| FPH| - | FP||, as happens in scaleit), but on the basis of
both the observed amplitudes and their standard deviations. The normal probability plot
together with the ``large contributions to '' table (files
CHIcontributions.dat and CHIcontributions.ps),
which is produced after the calculation is over, should allow you to justifiably select
outliers8.
This is achieved as follows : GraphEnt will write out an ASCII file (named Normplot_tails.dat which contains the hkl indeces for all reflections that comprise the tails of the plot. These points are shown in the graphics window with a different colour. If some of these points deviate significantly from the rest of the plot, then they are candidates for rejection (note that some deviation from linearity will always be present near the tails. What you are looking for is an outstanding deviation.)
You can then match what you see in the plot with what is written in the
Normplot_tails.dat, decide which reflections to exclude, write their indeces in an
ASCII file with the name REJECT.HKL,
and then re-run the program using the MAXENT_AUTO.IN file
after adding the keyword REJECT (see page ). Because this sounds
quite complicated, I will now give a detailed example to show how it works :
We start with just one .mtz file containing data for a putative derivative :
crystal2 ~/test crystal2 ~/test d total 260 -rw-r--r-- 1 glykos sys 262300 Dec 16 15:45 from_scaleit.mtz crystal2 ~/test crystal2 ~/test mtzdump hklin from_scaleit.mtz 1########################################################## ########################################################## ########################################################## ### CCP PROGRAM SUITE: MTZDUMP VERSION 3.5: 18/06/98## ########################################################## User: glykos Run date: 12/16/99 Run time:15:47:00 Please reference: Collaborative Computational Project, Number 4. 1994. "The CCP4 Suite: Programs for Protein Crystallography". Acta Cryst. D50, 760-763. as well as any specific reference in the program write-up. Optional input follows. Keywords: RESO STATS LRESO HEAD NREF START SKIP SYMM BATCH FORMAT RUN/GO/END RESO max min - resolution limits for listing (default all) STATS [NBIN num] [RESO max min] - no. of reso. bins and limits for stats. LRESO - S is given for each listed reflection HEAD - print MTZ file header only NREF num - number of reflections listed (default 10) START H0 K0 L0 - first reflection listed (default first) SKIP nskip - no. of refls. skipped before listing (default 0) SYMMETRY - list symmetry info VALM num - missing data set to this value BATCH - list batch orientation blocks FORMAT fmt - format of listed refls. .e.g. '(3i4,10f8.2)' RUN/GO/END - to start dump go (Q)QOPEN allocated # 1 User: glykos Logical Name: HKLIN Status: READONLY Filename: from_scaleit.mtz * Title: Compare. * Number of Columns = 9 * Number of Reflections = 7215 * Missing value set to NaN in input mtz file * HISTORY for current MTZ file : From SCALEIT, 12/16/99 14:23:39 anisotropic scale From TRUNCATE, 11/16/99 16:30:06 From SORTMTZ, 11/16/99 16:30:05 using keys: H K L IMEAN From scalepack2mtz run on 11/16/99 From SCALEPACK2MTZ, 11/16/99 16:30:05 from data from CAD on 11/16/99 data from CAD on 12/16/99 * Column Labels : H K L FP SIGFP FPH SIGPH DPH SIGDPH * Column Types : H H H F Q F Q D Q * Cell Dimensions : 94.149 24.170 64.319 90.000 130.367 90.000 * Resolution Range : 0.00078 0.24498 ( 35.806 - 2.020 A ) * Sort Order : 1 2 3 4 0 * Space group = C2 (number 5) OVERALL FILE STATISTICS for resolution range 0.001 - 0.245 ======================= Col Sort Min Max Num % Mean Mean Resolution Type Column num order Missing complete abs. Low High label 1 ASC -46 35 0 100.00 -11.3 18.0 35.81 2.02 H H 2 NONE 0 11 0 100.00 4.0 4.0 35.81 2.02 H K 3 NONE 0 31 0 100.00 12.3 12.3 35.81 2.02 H L 4 NONE 4.4 902.0 3 99.96 92.65 92.65 35.81 2.02 F FP 5 NONE 0.6 26.2 3 99.96 3.34 3.34 35.81 2.02 Q SIGFP 6 NONE 8.7 956.3 3500 51.49 137.07 137.07 18.78 2.50 F FPH 7 NONE 1.2 41.5 3500 51.49 7.95 7.95 18.78 2.50 Q SIGPH 8 NONE -73.2 72.3 3718 48.47 0.29 7.29 18.78 2.51 D DPH 9 NONE 0.0 66.8 3718 48.47 11.85 11.85 18.78 2.51 Q SIGDPH No. of reflections used in FILE STATISTICS 7215 LIST OF REFLECTIONS =================== -46 0 17 24.97 13.10 ? ? ? ? -46 0 18 14.22 8.91 ? ? ? ? -46 0 19 29.73 9.50 ? ? ? ? -46 0 20 18.41 10.46 ? ? ? ? -46 0 21 17.12 9.93 ? ? ? ? -46 0 22 76.57 3.66 ? ? ? ? -45 1 14 25.98 8.23 ? ? ? ? -45 1 15 20.69 7.89 ? ? ? ? -45 1 16 26.74 7.56 ? ? ? ? -45 1 17 67.52 5.48 ? ? ? ? MTZDUMP: Normal termination of mtzdump Times: User: 0.2s System: 0.1s Elapsed: 0:03 crystal2 ~/test crystal2 ~/test
Then, we run GraphEnt on the centrosymmetric [010] projection :
crystal2 ~/test crystal2 ~/test GraphEnt h0l 10 3 from_scaleit.mtz ___________________________________________________________________________________________________________________________ ### ### ####### # ## ## # # # # # # # #### ### ### # # ## ### ###### # # # # # # # # # # ## # # # # # # # # #### # # # # # # ##### # # # # # # # # # # # # # # # # # # # # # # # # # # # # # ### ### #### # ### ### ####### ### ### ### Gull, S.F. & Daniell, G.J. (1978), Nature, 272, 686-690 Collins, D.M. (1982), Nature, 298, 49-51 NMG ___________________________________________________________________________________________________________________________ - Assuming that input is a .mtz file. Interpreting ... User: glykos Logical Name: from_scaleit.mtz Status: READONLY Filename: from_scaleit.mtz HEADER INFORMATION FROM INPUT MTZ FILE ON INDEX 1 * Title: Compare. * Number of Columns = 9 * Number of Reflections = 7215 * Missing value set to NaN in input mtz file * Column Labels : H K L FP SIGFP FPH SIGPH DPH SIGDPH * Column Types : H H H F Q F Q D Q * Cell Dimensions : 94.149 24.170 64.319 90.000 130.367 90.000 * Resolution Range : 0.00078 0.24498 ( 35.806 - 2.020 A ) * Sort Order : 1 2 3 4 0 * Space group = C2 (number 5) - Accepted column type combination found. - Zone selection with MTZUTILS. - Proceed to data expansion with CAD. - Final .mtz pass : preparing ascii file for MAXENT. Synthesis type set to : Isomorphous difference Patterson synthesis with h,k,l,FP,sig(FP),FPH,sig(FPH) (anomalous data present) (Q)QOPEN allocated # 1 User: glykos Logical Name: FOR_MAXENT_P1.mtz Status: READONLY Filename: FOR_MAXENT_P1.mtz HEADER INFORMATION FROM INPUT MTZ FILE ON INDEX 1 * Title: Compare... * Number of Columns = 9 * Number of Reflections = 864 * Missing value set to NaN in input mtz file * Column Labels : H K L FP SIGFP FPH SIGPH DPH SIGDPH * Column Types : H H H F Q F Q D Q * Cell Dimensions : 94.149 24.170 64.319 90.000 130.367 90.000 * Resolution Range : 0.00078 0.24340 ( 35.806 - 2.027 A ) * Sort Order : 1 2 3 4 0 * Space group = C2 (number 5) ___________________________________________________________________________________________________________________________ - AUTO keyword present. Interpreting input ... - Seven items found and the last is not FOM. h,k,l,FP,sig(FP),FPH,sig(FPH) for difference Patterson assumed. - First pass to determine indeces ranges. - Second (final) pass for output. - 363 reflections matched and output. ___________________________________________________________________________________________________________________________ Keyword CELL : Cell dimensions set to 94.15 24.17 64.32 90.00 130.37 90.00 Keyword SPACEGROUP : space group number set to 1 Keyword MAP_FORMAT : CCP4 map file selected. Keyword DIFF_PATT : Difference Patterson map run [h k l FP sig(FP) FPH sig(FPH)]. Keyword PERMUTATION : Permutation set to 3 1 2 Keyword GRID : Grid set to 128 256 1 Keyword GRACYCLES : Plot every 80 cycles. Keyword GRATWOWINDOWS : Will keep conventional map plot. Keyword VERBOSE : verbose mode set. Keyword REFLECTIONS : start reading reflections. ___________________________________________________________________________________________________________________________ Added reflection -8 +0 +0 +33.019 +72.979 +0.0 Added reflection -10 +0 +0 +132.756 +129.899 +0.0 Added reflection -12 +0 +0 +53.211 +217.441 +0.0 Added reflection -14 +0 +0 +1367.982 +1319.306 +0.0 Added reflection -16 +0 +0 +5958.070 +6806.554 +0.0 Added reflection -18 +0 +0 +277.489 +448.849 +0.0 Added reflection -20 +0 +0 +152.506 +313.580 +0.0 Added reflection -22 +0 +0 +517.969 +1391.308 +0.0 - Trial-and-error definition of initial lambda value. - Verbose output requested. - Lambda linearly depended on number of cycles. - Grid (along fast, medium and slow) 128 256 1 - Axes permutation is fast-Z, medium-X, slow-Y - About to allocate memory. - FFTW is learning how to do FFTs ... - FFTW learned how to do FFTs. - Saving FFTW's wisdom file ... There are 371 observations with <F>= 1213.36 - Loading data on arrays ... - Do not panic yet ... - About to calculate conventional Fourier transform of data - Write out conventional Fourier transform (file conventional.map) (Q)QOPEN allocated # 1 User: glykos Logical Name: conventional.map Status: UNKNOWN Filename: conventional.map File name for output map file on unit 1 : conventional.map logical name conventional.map FORMATTED OLD file opened on unit 24 Logical name: SYMOP, Full name: /disk2/public/xtal/ccp4/lib/data/symop.lib Minimum density in map = -268.12149 Maximum density = 999.00000 Mean density = 0.00000 Rms deviation from mean = 80.94175 - Now trying lambda = 0.010000 ............................................. - Initial value for lambda set to 1000.000000 ___________________________________________________________________________________________________________________________ - MAXENT starts here Chi**2 : 1593.822 R : 1.0000 Lambda : 1000.00000 Nobs : 366 Chi**2 : 1588.187 R : 0.9992 Lambda : 1000.00000 Nobs : 366 ........................................................................................ Chi**2 : 365.790 R : 0.5621 Lambda : 945.19320 Nobs : 366 803 cycles in 74 seconds, giving an average of 0.092 seconds per cycle. ___________________________________________________________________________________________________________________________ CONVERGENCE ACHIEVED. The final R-factor between the observed and calculated amplitudes is 0.5621040 (Q)QOPEN status changed from NEW to UNKNOWN for maxent.map (Q)QOPEN allocated # 1 User: glykos Logical Name: maxent.map Status: UNKNOWN Filename: maxent.map File name for output map file on unit 1 : maxent.map logical name maxent.map FORMATTED OLD file opened on unit 24 Logical name: SYMOP, Full name: /disk2/public/xtal/ccp4/lib/data/symop.lib Minimum density in map = 0.12185 Maximum density = 999.00000 Mean density = 26.37672 Rms deviation from mean = 35.99294 Normal termination ? (100 seconds)
Now we have all these files :
crystal2 ~/test d total 652 -rw-r--r-- 1 glykos sys 68 Dec 16 15:51 CHIcontributions.dat -rw-r--r-- 1 glykos sys 37224 Dec 16 15:51 CHIcontributions.ps -rw-r--r-- 1 glykos sys 31101 Dec 16 15:49 MAXENT_AUTO.IN -rw-r--r-- 1 glykos sys 30595 Dec 16 15:48 MAXENT_FROM_MTZ.in -rw-r--r-- 1 glykos sys 103 Dec 16 15:48 MAXENT_FROM_MTZ_ANOMALOUS.in -rw-r--r-- 1 glykos sys 10365 Dec 16 15:48 Normal_probability.ps -rw-r--r-- 1 glykos sys 825 Dec 16 15:48 Normplot_tails.dat -rw-r--r-- 1 glykos sys 132176 Dec 16 15:50 conventional.map -rw-r--r-- 1 glykos sys 262300 Dec 16 15:45 from_scaleit.mtz -rw-r--r-- 1 glykos sys 132176 Dec 16 15:51 maxent.map crystal2 ~/test
Both CHIcontributions.dat and Normplot_tails.dat point to problems with reflections 0,0,11 and -12,0,8 :
crystal2 ~/test crystal2 ~/test crystal2 ~/test more CHIcontributions.dat 0 0 11 55.19882 -12 0 8 59.04416 crystal2 ~/test crystal2 ~/test crystal2 ~/test more Normplot_tails.dat 0 0 6 -2.99385 -30.69588 2 0 4 -2.64107 -28.07780 -12 0 8 -2.46310 -26.91301 0 0 11 -2.34000 -25.43124 -4 0 8 -2.24461 -22.29077 0 0 7 -2.16611 -21.85669 4 0 10 -2.09905 -18.73302 -16 0 5 +2.04028 +10.47118 8 0 6 +2.09905 +10.55087 4 0 4 +2.16611 +10.55754 -8 0 9 +2.24461 +11.08962 6 0 3 +2.34000 +11.90654 4 0 6 +2.46310 +12.23197 -16 0 10 +2.64107 +12.45890 2 0 6 +2.99385 +13.12762 crystal2 ~/test crystal2 ~/test
The normal probability plot suggests that all seven reflections in the lower left-hand side corner are suspect. Its somewhat sigmoidal shape suggests the presence of non-normally distributed (systematic) errors :
Let's repeat the calculation but with these seven reflections excluded from the calculation. The first step is to create a file with the name REJECT.HKL whose first three columns contain the indeces of the reflections to be excluded :
crystal2 ~/test crystal2 ~/test cp Normplot_tails.dat REJECT.HKL crystal2 ~/test ed REJECT.HKL crystal2 ~/test more REJECT.HKL 0 0 6 -2.99385 -30.69588 2 0 4 -2.64107 -28.07780 -12 0 8 -2.46310 -26.91301 0 0 11 -2.34000 -25.43124 -4 0 8 -2.24461 -22.29077 0 0 7 -2.16611 -21.85669 4 0 10 -2.09905 -18.73302 crystal2 ~/test crystal2 ~/test
Then, we edit the file MAXENT_AUTO.IN and we add the keyword REJECT :
crystal2 ~/test crystal2 ~/test ed MAXENT_AUTO.IN crystal2 ~/test more -20 MAXENT_AUTO.IN REJECT CELL 94.14900 24.17000 64.31901 90.00000 130.36700 90.00000 SPACEGROUP 1 MAP_FORMAT CCP4 DIFF_PATT PERMUTATION 3 1 2 GRID 128 256 1 GRACYCLES 80 GRATWOWINDOWS REFLECTIONS -30 0 9 89.88602 3.43968 123.75751 12.84017 -30 0 10 126.17858 3.93975 110.84611 10.25688 -30 0 11 38.71215 5.14720 36.43570 15.66436 -30 0 12 165.68549 4.99690 154.67838 7.42726 -30 0 13 38.65771 4.30664 43.59790 16.74030 -30 0 14 158.72888 4.75254 159.23166 5.49528 -30 0 15 86.40644 3.25414 84.79811 15.51947 -30 0 16 150.11438 4.57498 146.66685 5.11194 -30 0 17 132.07582 4.08662 164.78131 5.11169 -30 0 18 21.89952 8.06613 23.18951 10.87039 ................................................................................. crystal2 ~/test crystal2 ~/test
... and we run it again, BUT THIS TIME GIVING AS INPUT THE MAXENT_AUTO.IN FILE :
crystal2 ~/test crystal2 ~/test GraphEnt MAXENT_AUTO.IN ___________________________________________________________________________________________________________________________ ### ### ####### # ## ## # # # # # # # #### ### ### # # ## ### ###### # # # # # # # # # # ## # # # # # # # # #### # # # # # # ##### # # # # # # # # # # # # # # # # # # # # # # # # # # # # # ### ### #### # ### ### ####### ### ### ### Gull, S.F. & Daniell, G.J. (1978), Nature, 272, 686-690 Collins, D.M. (1982), Nature, 298, 49-51 NMG ___________________________________________________________________________________________________________________________ Keyword REJECT : 7 reflections specified in REJECT.HKL. Keyword CELL : Cell dimensions set to 94.15 24.17 64.32 90.00 130.37 90.00 Keyword SPACEGROUP : space group number set to 1 Keyword MAP_FORMAT : CCP4 map file selected. Keyword DIFF_PATT : Difference Patterson map run [h k l FP sig(FP) FPH sig(FPH)]. Keyword PERMUTATION : Permutation set to 3 1 2 Keyword GRID : Grid set to 128 256 1 Keyword GRACYCLES : Plot every 80 cycles. Keyword GRATWOWINDOWS : Will keep conventional map plot. Keyword REFLECTIONS : start reading reflections. Reflection rejected : -12 0 8 Reflection rejected : -4 0 8 Reflection rejected : 0 0 6 Reflection rejected : 0 0 7 Reflection rejected : 0 0 11 Reflection rejected : 2 0 4 Reflection rejected : 4 0 10 ___________________________________________________________________________________________________________________________ ........................................................................................................................... Normal termination ? (32 seconds)