Atomic Descriptors
==================
AtomicAI provides locally-averaged atomic fingerprints (LAAF) that encode the
chemical environment of each atom. These descriptors are used as input features
for machine learning models.
Command-line usage
------------------
.. code-block:: bash
generate_descriptors trajectory.xyz [--descriptor TYPE [TYPE ...]] [--n-eta N]
Options
~~~~~~~
.. list-table::
:header-rows: 1
:widths: 20 15 65
* - Option
- Default
- Description
* - ``--descriptor`` / ``-d``
- ``ACSF_G2 ACSF_G2G4 SOAP``
- One or more descriptor types to compute
* - ``--n-eta`` / ``-n``
- ``50``
- Number of eta decay functions
Descriptor types
----------------
ACSF_G2 — Radial symmetry functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Two-body Behler-Parrinello G2 functions. Each function is a Gaussian in
interatomic distance, parameterised by eta (width) and R\ :sub:`s` (shift):
.. math::
G^2_i = \sum_j e^{-\eta (r_{ij} - R_s)^2} \cdot f_c(r_{ij})
where :math:`f_c` is a cosine cutoff function.
.. code-block:: bash
generate_descriptors traj.xyz --descriptor ACSF_G2 --n-eta 80
ACSF_G3 — Cosine basis functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
G3 functions use a cosine basis parameterised by kappa:
.. math::
G^3_i = \sum_j \cos(\kappa \cdot r_{ij}) \cdot f_c(r_{ij})
.. code-block:: bash
generate_descriptors traj.xyz --descriptor ACSF_G3
ACSF_G4 — Angular symmetry functions (with r\ :sub:`jk`)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Three-body functions that encode bond angles. The r\ :sub:`jk` term is included
in the cutoff and exponent sum:
.. math::
G^4_i = 2^{1-\zeta} \sum_{j,k \neq i} (1 + \lambda \cos\theta_{jik})^\zeta
\cdot e^{-\eta(r_{ij}^2 + r_{ik}^2 + r_{jk}^2)} \cdot f_c(r_{ij}) f_c(r_{ik}) f_c(r_{jk})
.. code-block:: bash
generate_descriptors traj.xyz --descriptor ACSF_G4
ACSF_G5 — Angular symmetry functions (without r\ :sub:`jk`)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Similar to G4 but the r\ :sub:`jk` distance is not included, making it faster
to compute for large systems:
.. math::
G^5_i = 2^{1-\zeta} \sum_{j,k \neq i} (1 + \lambda \cos\theta_{jik})^\zeta
\cdot e^{-\eta(r_{ij}^2 + r_{ik}^2)} \cdot f_c(r_{ij}) f_c(r_{ik})
.. code-block:: bash
generate_descriptors traj.xyz --descriptor ACSF_G5
ACSF_G2G4 — Combined radial + angular (recommended)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Concatenates G2 and G4 vectors to produce a complete two-body + three-body
descriptor. This is generally the best balance of accuracy and cost.
.. code-block:: bash
generate_descriptors traj.xyz --descriptor ACSF_G2G4 --n-eta 60
ACSF_G2G4G5 — Full combined descriptor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Concatenates G2 + G4 + G5. Provides the richest angular description.
.. code-block:: bash
generate_descriptors traj.xyz --descriptor ACSF_G2G4G5
SOAP — Smooth Overlap of Atomic Positions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Rotationally invariant descriptor based on the overlap of atomic density
functions, computed via `DScribe `_.
.. code-block:: bash
generate_descriptors traj.xyz --descriptor SOAP
MBSF — Many-body symmetry functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Combines a radial term (gr, G2-like) with an angular term (ga) that includes
:math:`\zeta`, :math:`\theta_s`, eta, and R\ :sub:`s` parameters.
.. code-block:: bash
generate_descriptors traj.xyz --descriptor MBSF
Output
------
Descriptor files are written to ``./descriptors/`` with the naming convention::
____.dat
Each row is one averaged fingerprint vector for a single atom. The cutoff values
(in Å) come from the built-in ``descriptor_cutoff`` table in
``AtomicAI/data/data_lib.py``.
Running multiple types
----------------------
You can compute several descriptor types in a single call — they run in parallel
using Python multiprocessing:
.. code-block:: bash
generate_descriptors traj.xyz --descriptor ACSF_G2 ACSF_G2G4 SOAP MBSF --n-eta 50