The social need for realistic atmospheric simulation in weather prediction, climate change attribution, seasonal forecasting, and climate projection is great. To obtain realistic simulations, we need more physical processes included in the model with greater fidelity and finer spatial resolution. Spatial resolution primarily drives the need for computational resources because reducing the model grid spacing by a factor f requires f 4 times more computation (assuming 3-D refinement). This compute power comes from large parallel machines with 10,000s of separate nodes and accelerators such as graphics processing units (GPUs) making efficiency a complicated problem.
Efficiency parallel integration algorithms need low internode communication, minimal synchronization, large time steps, and clustered computation. To this end, we propose new characteristics-based methods for the atmospheric dynamical equations with these properties in mind. These schemes are capable of simulating at a large CFL time step in only one stage of computations, needing only one copy of the state variables. They are implemented in a 2-D non-hydrostatic compressible equation set in an x-z (horizontal-vertical) Cartesian plane to simulate buoyancy-driven flows such as rising thermals and internal gravity waves.
The schemes are implemented to run on CPU and multi-GPU architectures using Nvidia's CUDA (Compute Unified Device Architecture) language to test relative efficiency. Even with- out memory tuning, the GPU code showed roughly 2.5x (5x) better performance per Watt. With optimization, this could increase by an order of magnitude.
The methods can use any spatial interpolant, so two major formulations are proposed and tested. One uses WENO interpolants which are pre-computed, and the other uses standard polynomials and computes them on-the-fly. The advantage of on-the-fly calculations is a significant reduction in the volume of data communicated to and from the GPU's slow global memory. In some cases, the runtime on GPUs of the same problem size cut in half because of this reduction in the amount of data communicated.
Regarding accuracy, a series of modifications were proposed and implemented in the hopes of improving accuracy. These included higher-order accurate trajectories and inclusion of the gravity source term in the characteristic variables and trajectories. When testing the comparative accuracies in a smooth gravity waves test case, the modifications showed from 10% to 20% improvement. Particularly, higher-order accurate trajectories yielded the best improvement for computational cost. It was found, however, that the amount of diffusion included in the scheme controlled the accuracy the most.
Overall, these schemes have proven viable for atmospheric simulation up to a CFL value near two when the hyperdiffusion is properly tuned. Compared to a three-stage scheme that sub-cycles fast waves twice per stage (for CFL = 2), this method requires one-sixth the synchronization points, one-third the communication, and one-half the memory requirements. Also, all computations are clustered into a single stage.
Beyond CFL values near two, instabilities require diffusion levels that overdamp the smaller scales to an unacceptable level. The potential causes of these instabilities are discussed, and future work is proposed to ascertain the cause and address it. Also, the role of these methods within a full 3-D non-hydrostatic dynamical core is discussed, and the milestones for extension to a full atmospheric model are given.