Type of Solvers and Solution Control Parameters
This section deals with solution controls for solvers including topics like CFL Number, Time-step for Transient Simulations, Psuedo-time Marching, Parallel Computing, Nodes and Cluster, HPC - High Performance Computing, Threading, Partitioning, MPI - Message Passing Interface and Scalability.
Solver setting process encompasses following aspects of numerical solutions
- Discretization scheme for momentum, pressure, energy and turbulence parameters
- PV-Coupling such as SIMPLE, SIMPLER, PISO
- Conservation target and residual levels for convergence criteria. In addition to the "residual norm",
it is strongly recommended to set "mass-conservation" or "energy-conservation" as convergence check.
- Fluid time-scales
- Solid time-scales (in case of conjugate heat transfer or pure conduction problems). The recommended practice
to have solid time-scale set to an order of magnitude (10 times) higher than fluid time-scale.
- You should ensure that the solution is mesh-independent and use mesh adaption to modify the mesh or create additional meshes for the mesh-independence study. A mesh-sensitive result confirms presence of "False Diffusion".
- The node-based averaging scheme is known to be more accurate than the default cell-based scheme for unstructured meshes, most notably for triangular and tetrahedral meshes.
- Note that for coupled solvers which solves psuedo-transient equations even for steady state problems (such as CFX), relaxation factors are not required to be set. The solver applies a false timestep as the convergence process is iterated towards final solution.
Differences between co-located (non-staggered) and staggered grid layout
- For co-located solvers (such as ANSYS CFX), control volumes are identical for all transport equations, continuity as well as momentum.
- CFX is a nodal based solver and constructs control volumes around the nodes from element sectors, thus the number of control volumes is equal to the number of nodes and not the number of elements. The input of CFX may be tetrahedron, psrim and hexhedon in physical form, the solver internally generates a polyhedral mesh. In a cell centered code, such as FLUENT or STAR-CD / STAR-CCM+, number of elements are the same as the number of control volumes (as the control volumes are same as physical elements) so these are often used interchangably.
- All field (or unknown or solution) variables as well as fluid properties are stored at the nodes (the vertices on the mesh).
- The distinction between colocated and staggered approach is nicely explained in section 4.6 by S. V. Patankar in his book "Numerical Heat Transfer and Fluid Flow" and section 7.2 in the book Computational Methods for Fluid Dynamics by J. H. Ferziger and M. Peric.
- For staggered solvers like ANSYS FLUENT, values of scalar field variables as well as material properties are stored as cell centres.
- Due to the difference in the way field variables are stored, the simulation with same mesh, material properties and boundary conditions, the Y+ value reported in ANSYS CFX will be approximately twice that reported in FLUENT.
Some more excerpts from user manuals of commercial tools
- Smaller physical timesteps are more robust than larger ones.
- An Isothermal simulation is more robust than modeling heat transfer. The Thermal Energy model is more robust than the Total Energy Model.
- Velocity or mass specified boundary conditions are more robust than pressure specified boundary conditions. A Static pressure boundary is more robust than a total pressure boundary.
- If the characteristic time scale is not simply the advection time of the problem, there may be transient effects holding up convergence. Heat transfer or combustion processes may take a long time to convect through the domain or settle out. There may also be vortices caused by the initial guess, which take longer to move through the entire solution domain. In these cases, a larger timestep may be needed to push things through initially, followed by a smaller timestep to ensure convergence on the small time scale physics. If the large timestep results in solver instability, then a small time scale should be used and more iterations may be required.
- Sometimes the levels of turbulence in the domain can affect convergence. If the level of turbulence is non-physically too low, then the flow might be "too thin" and transient flow effects may be dominating. Conversely if the level of turbulence is non-physically too high then the flow might be "too thick" and cause unrealistic pressure changes in the domain. It is wise to look at the Eddy Viscosity and compare it to the dynamic (molecular) viscosity. Typically the Eddy Viscosity is of the order of 1000 times the dynamic viscosity, for a fully turbulent flow.
- The 2nd Order High Resolution advection scheme has the desirable property of giving 2nd order accurate gradient resolution while keeping solution variables physically bounded. However, may cause convergence problems for some cases due to the nonlinearity of the Beta value. If you are running High Res and are having convergence difficulty, try reducing your timestep. If you still have problems converging, try switching to a Specified Blend Factor of 0.75 and gradually increasing the Blend Factor to as close to 1.0 as possible.
Solver Setting for Transient Simulation: CFL Number
- The CFL number scales the time-step sizes that are used for the time-marching scheme of the flow solver.
- A higher value leads to faster convergence but can lead to divergence and unstable simulations.
- The inverse of this is also true where choosing a smaller value in an unstable simulation improves convergence.
Mesh Reordering - Node Renumbering
Both elements and nodes are numbered where elements are described as a set of nodes forming its vertices. The more compact is the arrangement of elements and nodes, lesser will be the memory requirements. Some terms associated with elements and node arrangements in a mesh are as follows.
- Bandwidth: It is the maximum difference between neighboring cells in a zone i.e. if each cell in the zone is numbered in increasing order sequentially, bandwidth is the maximum differences between these indices.
- Excerpt from user manaual - FLUENT: Since most of the computational loops are over faces, you would like the two cells in memory cache at the same time to reduce cache and/or disk swapping i.e. you want the cells near each other in memory to reduce the cost of memory access.
- In general, the faces and cells are reordered so that neighboring cells are near each other in the zone and in memory resulting in a more diagoal matrix that is non-zero elements are closer to diagonal. Refer to "banded-matrices" in context with numerical simulations.
In order to reduce the simulation run time, parallel processing methods have been developed where the arithmetic involved with matrices are broken into segments and assigned to different processors. Some of the the terms associated with this technology are described below. The image frm "Optimising the Parallelisation of OpenFOAM
Simulations" by Shannon Keough outlines the layout of various components in the cluster.
- HPC: High Performnce Computing - A generic term used to describe the infrastructure (both hardware and software) for parallel processing.
- Cluster: A collection of workstations or nodes connected with each other with high speed network such as 1 GB/s Ethernet network or InfiniBand.
- Node: Each independent component of a cluser or HPC set-up. Each node has following configurations:
- Chassis: For example - HP Z820
- CPU: For example - Intel XEON-E5 2687W
- Cores: For example - 8 cores per CPU @ 3.1GHz
- RAM: For example - 8GB DDR3-1600 Registered ECC memory
- Storage: For example - 1TB SATA HDD plus network file system (NFS) server
- Operating System: For example - CentOS 6.5
- Application Program: For example - CFD Software: OpenFOAM or FLUENT V17.2
- Hyper-threading: It is a method which allows each core on the CPU to present itself to the operating system as two cores: one real and one virtual. The operating system can then assign jobs to the virtual cores and these jobs are run when the real core would otherwise be idle (such as during memory read/write) theoretically maximising the utilisation of the CPU.
- Excerpts from STAR-CCM+ User Manual: "Generally, for best performance, it is recommended that you turn off any features that artificially increase the number of cores on a processor, such as hyper-threading. In situations where hyper-threading is turned on to benefit other applications, you should generally avoid
loading more processes than there are physical cores.
In addition to turning off hyper-threading, it is recommended that you set your BIOS settings to favor performance rather than power saving. Almost all heavily parallelized applications suffer performance problems if the CPU frequencies are spun up and down. In most parallel applications, and STAR-CCM+ in particular, when one core is operating at a reduced frequency, all other cores are running at lower frequencies as well.
- Memory Intensive Applications: This refers to the programs whose performance is more influenced by RAM than the clock-speed of the CPU. For example - OpenFOAM
- Partitioninng: This is nothing of a "work distribution" among CPUs. However, tracking of nodes and elements are important. Excerpts from FLUENT user's manual: "Balancing the partitions (equalizing the number of cells) ensures that each processor has an equal load and that the partitions will be ready to communicate at about the same time. Since communication between partitions can be a relatively time-consuming process, minimizing the number of interfaces can reduce the time associated with this data interchange."
- MPI: Message Passing Interface - Each compute node is connected to every other compute node though the network and relies on inter-process communication to perform functions such as sending and receiving arrays, summations over all cells, assembling global matrix - remember A.x = b. Inter-process communication is managed by a message-passing library by an appropriate implementation of the Message Passing Interface (MPI) standard such as OpenMPI, Intel MPI, Cary MPI, IBM Platform MPI, Microsoft MPI, MPICH2 or vendor's own version of MPU.
- Scalability: It refers to decrease in turnaround time for solutions as the number of compute nodes increases. However, beyond a certain point the ratio of network communication (refer to definition of Cluster above) to computation increases, leading to reduced parallel efficiency, so optimal system sizing is important for any simulations - determined from a ratio of the time to compute and the time that is taken to exchange data between cluster nodes.