Key factors that affect OLGA simulation speed

Recently, I was able to see a discussion about OLGA speed in OLGA Users on LinkedIn. The discussion begins with the question of how OLGA performs on two different CPUs. An Intel Core i7 processor running at 3.4 GHz, and an Intel Core i5 processor running at 3.4 GHz. Because Core i5 processors are significantly cheaper, this result is both surprising and alarming.
In the hope of speeding up OLGA, I’ve seen flow assurance companies purchase expensive hardware. These investments have not always paid off. As a flow assurance consultant, one of these misses was something I saw. We found that OLGA ran as fast on expensive hardware as it did on one-year old desktops. Since then I have spent a lot of time studying OLGA speed and trying to understand what factors affect OLGA performance.

I decided to share the information I gained from my research in order to help flow assurance companies make such buying decisions. It might also be worth adding data and analysis to this discussion, and specifically looking at the role of the number threads in OLGA speed. Torgeir Vanvik, Schlumberger, provided excellent insights into how OLGA software works. I hope this post sheds some light on the topic.
The key factors that impact OLGA simulation speed

There are several factors that can affect OLGA’s simulation speed. Some are related to the complexity of the numerical modeling, while others relate to the hardware that OLGA runs on.

The most important factor in modeling is the network’s complexity. Single branch models generally run faster than networks, while simple converging networks are faster than diverging networks. This is something that flow assurance engineers cannot control, so it is not worth further discussion.

Next are the section lengths, and the numerical time step. The simulation time step in OLGA is controlled by the parameters MINDT, MAXDT in INTEGRATION and the DTCONTROL parameters. Simulations are usually run with the CFL control controlling simulation time step to ensure stability. CFL conditions determine how far fluid can move in one step relative to the length of the section. This means that the length of a section determines how long your time steps can be. Model speed is affected by the INTEGRATION, DTCONTROL and section lengths. Model speed is usually determined by the smallest segment in the network. This is something I could write about, but that is another topic.

Model speed is usually controlled by the smallest segment of the network.

The key factors that impact simulation speed on the hardware side are CPU speed and I/O speed.
The processor

Two specifications are essential for modern CPUs: clock speed and core count. The clock speed indicates how many instructions can be processed per second and the number cores shows how many instructions can be processed simultaneously. Modern versions of OLGA (6+) can take advantage of the power of multiple cores, while older versions (5+) do not get any benefit from multicore processors.

Clock speed is crucial, regardless of the version. It all comes down to the number of instructions that can be processed per second. The processor’s GHz is crucial.

Clock speed is crucial, regardless of the version of OLGA.

The speed of OLGA 6 or later versions will also depend on the number and type of cores. It is easy to believe that more cores means faster simulation speeds. Some tasks are better off being processed in parallel than others. This is a sad reality. The task will run slower if the time it takes to break down the task into smaller problems than the time saved by parallel processing. The theoretical limits of parallelization are dependent on the problem. This holds true for OLGA as well.

It is necessary to investigate OLGA parallelizability in order to answer practical questions such as “Is it better having a 3.4 GHz 4-core CPU or a 2.4 GHz 16-core CPU?” We will actually explore this topic in the next post.

There is theoretically a limit to the benefits of parallelization depending on the problem.

I/O

OLGA writes simulation results to disk while it is running. This can slow down (sometimes very severely) the run time speed. Two common hardware bottlenecks are the hard drive speed (when OLGA saves locally) and network bandwidth (when OLGA writes to a network disk).

Commercial-grade laptops and desktops come with mechanical hard-drives spinning at either 5400 or 7200 RPM, while servers-grade machines are often equipped with 10k or 15k RPM drives. The spin speed determines the speed at which the drive can be read/written. The drive’s OLGA speed will generally be faster if the spin rate is higher. Solid state drives (SSDs), which are also cheap and have matured enough to allow for commercial use, can now be purchased. SSDs can be slower than traditional mechanical drives, but they are still extremely fast depending on which manufacturer you choose. So, while SSDs may seem fast, they are not always as fast as you would like. The computer bus interface, which controls internal data transfer rates, is important. However, this interface is often a bottleneck. The hard drive’s performance can have an impact on the CPU’s simulation speed.

The hard drive’s performance can be just as important as the CPU in terms of simulation speed.

The network may limit the ability of OLGA to save OLGA results to a share. Companies should ensure that there is sufficient bandwidth between the OLGA computer and the network storage. This will reduce OLGA speed slowdowns.

The network can limit the ability of OLGA to save OLGA results to a share.

You can avoid these bottlenecks by carefully considering the frequency of simulation outputs.

Methodology

To eliminate I/O’s effect on parallel speedup, all models were run without trend or profile outputs. Each model was run multiple times using a different number of threads to ensure repeatability. It was easy to create a program that would run every model up to 20 times in ten minutes. This ensured that all models ran at minimum 2 times, and most ran the entire 20 times. For each model and for each thread combination, the average run time was calculated. Notably, the simulation iterations ran at almost identical times. This study was conducted using OLGA 2014.2 (see acknowledgements at the end). To manipulate the number threads that OLGA uses, use the following command. The thread is an element of a computer program that the operating system can manage separately. Two threads can be handled by a single core of a modern CPU.

All simulations were performed on a machine that had 4 cores and was capable of running 8 threads simultaneously.
Results

The first plot displays the speedups achieved by different models. The ideal speedup line indicates that a model with n threads should achieve a speedup equal to the one thread model. OLGA does not require you to specify the number CPU cores. In our case, it is 4.

Analyse

The efficiency and parallel speedup plots revealed that different types of models had different parallelization efficiency. The next question is how a model can be made more parallelizable.

The main calculation loop in OLGA would be the time loop, which tracks time from the beginning to the end. Reading input files, tab files and other sequential steps would be the initial sequence. Final post-processing could include closing file handles and releasing memory.

We used this background to calculate the parallel efficiency curves using an exponential function:

\mu_p=e^c(n_p-1)

Where

The parallel efficiency is mu_ptext

ctext refers to the parallel efficiency decay factor.

n_ptext refers to the number of threads.

The parallel efficiency decay factor is what I refer to as the calculated c factor. The decay factor can be plotted as a function o various aspects of the model. The decay factor is strongly related to the model runtime as well as the number of sections.

Let’s sum it all…

Let’s return to hardware choices. They have a significant impact on OLGA speed. The number of cores and clock speeds, as well as I/O speed, are all important factors. OLGA versions that are more recent have multithreaded capabilities, which allow them to run faster by using multiple processor cores. We performed a thorough analysis of how cores affect OLGA speed, and whether cores are worth the money.

OLGA defaults at using as many threads and cores as possible. Our analysis showed that 4 threads were the fastest, with a 75% parallel efficiency. The speedup is generally greater for simulations that are more computationally intensive. Multi-threading was not helpful for short simulations. For a simulation with 7000 sections, switching from 4 to 8 threads did not increase the speed. The parallel efficiency decreases as you go beyond 4 threads. Based on our analysis, we believe that 4 threads is a good number for running flow assurance models within OLGA. This could be fiddled with for specific models, but I wouldn’t recommend it.

Running flow assurance models in OLGA requires four threads.

In keeping with our findings, OLGA recommends that you use all available cores to run simultaneous simulations and not to speed up individual simulations. This advice is however a little too optimistic. A majority of professional laptops and desktops today have four cores, but they don’t have the hard drive access speeds necessary to support simultaneous simulations or data writing. The middle is the best choice.

When making a hardware purchasing decision, I wouldn’t go beyond 4 cores for a computer that has a mechanical hard disk. The storage option is just as important as the processor choice if you have enough OLGA licences and wish to centralize your simulations. To maximize parallel efficiency, I recommend setting OMP_NUM_THREADS environment to 4.