Examples¶
Introduction¶
This section demonstrate SIMPLE-NN with examples.
Example files are in SIMPLE-NN/examples/
.
In this example, snapshots from 500K MD trajectory of
amorphous SiO2 (60 atoms) are used as training set.
Note
Since we set the relative path for reference file in str_list
,
You need to move to the directory indicated in each section below to run the examples.
Generate NNP¶
To generate NNP using symmetry function and neural network,
you need three types of input file (input.yaml, str_list, params_XX)
as described in Tutorials section.
The example files except params_Si and params_O are introduced below.
Detail of params_Si and params_O can be found in Symmetry function section.
Input files introduced in this section can be found in
SIMPLE-NN/examples/SiO2/generate_NNP
.
# input.yaml
generate_features: true
preprocess: true
train_model: true
atom_types:
- Si
- O
symmetry_function:
params:
Si: params_Si
O: params_O
neural_network:
method: Adam
nodes: 30-30
batch_size: 10
total_iteration: 50000
learning_rate: 0.001
# str_list
../ab_initio_output/OUTCAR_comp ::10
With this input file, SIMPLE-NN calculate feature vectors and its derivatives (generate_features
),
generate training/validation dataset (preprocess
) and optimize the network (train_model
).
Sample VASP OUTCAR file (the file is compressed to reduce the file size) is in SIMPLE-NN/examples/SiO2/ab_initio_output
.
In MD trajectory, snapshots are sampled in the interval of 10 MD steps.
In this example, 70 symmetry functions consist of 8 radial symmetry functions per 2-body combination
and 18 angular symmetry functions per 3-body combination.
Thus, this model uses 70-30-30-1 network for both Si and O.
The network is optimized by Adam optimizer with the 0.001 of learning rate and batch size is 10.
Output files can be found in SIMPLE-NN/examples/SiO2/generate_NNP/outputs
.
In the folder, generated dataset is stored in data
folder
and execution log and energy/force RMSE are stored in LOG
.
Potential test¶
Generate test dataset¶
Generating a test dataset is same as generating a training/validation dataset.
In this example, we use same VASP OUTCAR to generate test dataset.
Input files introduced in this section can be found in
SIMPLE-NN/examples/SiO2/generate_test_data
.
# input.yaml
generate_features: true
preprocess: true
train_model: false
atom_types:
- Si
- O
symmetry_function:
params:
Si: params_Si
O: params_O
valid_rate: 0.
In this case, train_model
is set to false
because training process is not required to generate test dataset.
In addition, valid_rate also set to 0.
str_list
is same as Generate NNP section.
Note
To prevent overwriting of the existing training/validation dataset, create a new folder and create a test dataset.
Error check¶
To check the error for test dataset, use the setting below.
And for running test mode, you need to copy the train_list
file generated in Generate test dataset section
to SIMPLE-NN/examples/SiO2/error_check
and change filename to test_list
.
Edit the path to data directory in test_list
file accordingly.
For example, it should be changed from ./data/training_data_0000_to_0006.tfrecord
to ../generate_test_data/data/training_data_0000_to_0006.tfrecord
in this example.
Also, copy scale_factor
and params_*
to the current directory.
These files contain information on data set, so you have to carry them with the data set.
Input files introduced in this section can be found in
SIMPLE-NN/examples/SiO2/error_check
.
# input.yaml
generate_features: false
preprocess: false
train_model: true
atom_types:
- Si
- O
symmetry_function:
params:
Si: params_Si
O: params_O
neural_network:
method: Adam
nodes: 30-30
batch_size: 10
train: false
test: true
continue: true
Note
You need to change the filename from SAVER_iterationXXXX.*
to SAVER.*
to use the option continue: true
and modify the checkpoints file (remove ‘_iterationXXXX’ in the text).
If you use the option continue: weights
,
change the filename from potential_saved_iterationXXXX
to potential_saved
.
After running SIMPLE-NN with the setting above,
new output file named test_result
is generated.
The file is pickle format and you can open this file with python code of below:
from six.moves import cPickle as pickle
with open('test_result') as fil:
res = pickle.load(fil) # For Python 2
with open('test_result', 'rb') as fil:
res = pickle.load(fil, encoding='latin1') # For Python 3
In the file, DFT energies/forces, NNP energies/forces are included.
Principal component analysis¶
SIMPLE-NN provides principal component analysis (PCA) as a method for preprocessing input descriptor vector. Input descriptor vector, including Behler-type symmetry functions, often has high correlation between components. In that case, decorrelating input descriptor vector using PCA before feeding it to a machine-learning model can give much faster convergence.
In order to use PCA, add following lines in input.yaml
when you do preprocess and when you do training and testing.
For detailed descriptions of input parameters, see here.
neural_network:
pca: true
pca_whiten: true
pca_min_whiten_level: 1.0e-8
A pickle file named pca
will be generated during the preprocessing. You need to copy pca
file to where you run SIMPLE-NN with trained model, just like scale_factor
file.
Parameter tuning¶
GDF¶
GDF [1] is used to reduce the force errors of the sparsely sampled atoms.
To use GDF, you need to calculate the \(\rho(\mathbf{G})\)
by adding the following lines to the symmetry_function
section in input.yaml
.
SIMPLE-NN supports automatic parameter generation scheme for \(\sigma\) and \(c\).
Use the setting sigma: Auto
to get a robust \(\sigma\) and \(c\) (values are stored in LOG file).
Input files introduced in this section can be found in
SIMPLE-NN/examples/SiO2/parameter_tuning_GDF
.
#symmetry_function:
#continue: true # if individual pickle file is not deleted
atomic_weights:
type: gdf
params:
sigma: Auto
# for manual setting
# Si: 0.02
# O: 0.02
\(\rho(\mathbf{G})\) indicates the density of each training point.
After calculating \(\rho(\mathbf{G})\), histograms of \(\rho(\mathbf{G})^{-1}\)
are also saved as in the file of GDFinv_hist_XX.pdf
.
Note
If there is a peak in high \(\rho(\mathbf{G})^{-1}\) region in the histogram, increasing the Gaussian weight(\(\sigma\)) is recommended until the peak is removed. On the contrary, if multiple peaks are shown in low \(\rho(\mathbf{G})^{-1}\) region in the histogram, reduce \(\sigma\) is recommended until the peaks are combined.
In the default setting, the group of \(\rho(\mathbf{G})^{-1}\) is scaled to have average value of 1. The interval-averaged force error with respect to the \(\rho(\mathbf{G})^{-1}\) can be visualized with the following script.
from simple_nn.utils import graph as grp
grp.plot_error_vs_gdfinv(['Si','O'], 'test_result')
where test_result
is generated after Error check as the output file.
The graph of interval-averaged force errors with respect to the
\(\rho(\mathbf{G})^{-1}\) is generated as ferror_vs_GDFinv_XX.pdf
If default GDF is not sufficient to reduce the force error of sparsely sampled training points,
One can use scale function to increase the effect of GDF. In scale function,
\(b\) controls the decaying rate for low \(\rho(\mathbf{G})^{-1}\) and
\(c\) separates highly concentrated and sparsely sampled training points.
To use the scale function, add following lines to the symmetry_function
section in input.yaml
.
#symmetry_function:
weight_modifier:
type: modified sigmoid
params:
Si:
b: 0.02
c: 3500.
O:
b: 0.02
c: 10000.
For our experience, \(b=1.0\) and automatically selected \(c\) shows reasonable results.
To check the effect of scale function, use the following script for visualizing the
force error distribution according to \(\rho(\mathbf{G})^{-1}\).
In the script below, test_result_noscale
is the test result file from the training without scale function and
test_result_wscale
is the test result file from the training with scale function.
from simple_nn.utils import graph as grp
grp.plot_error_vs_gdfinv(['Si','O'], 'test_result_noscale', 'test_result_wscale')
[1] | W. Jeong, K. Lee, D. Yoo, D. Lee and S. Han, J. Phys. Chem. C 122 (2018) 22790 |
Uncertainty Estimation¶
Replica ensemble [2] is used to estimate the atomic-resolution uncertainty. Please read above paper for details. We recommend you to make independent directories for each step
Note
Before following steps, you have prepared *.pickle
in path/data/
.
If not, please run with below options first.
#input.yaml
generate_feature: true
preprocess: false
train_model: false
symmetry_function:
remain_pickle: true (default: false)
Step 1. Extract the atomic energy¶
Extract the atomic energy that will be used for reference of replicas.
Make test_list
as described in Potential test and prepare the potential_saved
#input.yaml
generate_feature: false
preprocess: false
train_model: true
neural_network:
NNP_to_pickle: true
test: false
train: false
continue: true (or weights)
Step 2. Write the data into tfrecord¶
Convert *.pickles
into tfrecord
to feed input data during training
#input.yaml
generate_feature: false
preprocess: true
train_model: false
symmetry_function:
add_NNP_ref: true
continue: true
Step 3. Train with atomic energy¶
Train model with atomic energy only to speed up (use_force
and use_stress
are false
). Choose a suitable the number of nodes and standard deviation of initial weight. Repeat this step several times by changing the number of nodes.
#input.yaml
generate_feature: false
preprocess: false
train_model: true
neural_network:
NNP_to_pickle: false
use_force: false
use_stress: false
nodes: (user's choice)
test: false
train: true
continue: false
E_loss: 3
weight_initializer:
params:
stddev: (user's choice)
symmetry_function:
add_NNP_ref: true
continue: true
Step 4. Molecular dynamics¶
Note
Before this step, you have to compile your LAMMPS with pair_nn_replica.cpp
and pair_nn_replica.h
.
LAMMPS can calculate the atomic uncertainty through standard deviation of atomic energies. Because our NNP do not deal with charged system, atomic uncertainty can be written as atomic charge. Prepare your data file as charge format and please modify your LAMMPS input as below example.
atom_style charge
pair_style nn/r (# of replica potentials)
pair_coeff * * (reference potential) (element1) (element2) ... &
(replica potential_#1) &
(replica_potential_#2) &
...
compute (ID) (group-ID) property/atom q
[2] | W. Jeong, D. Yoo, K. Lee, J. Jung and S. Han, J. Phys. Chem. Lett. 2020, 11, 6090-6096 |