Getting Started with PGI Compilers on AWS

PGI Community Edition compilers and tools for Linux/x86-64 provide a low-cost option for people interested in GPU-accelerated computing. These tools are now available as an Amazon Machine Image (AMI) on the AWS Marketplace, extending this low-cost paradigm for doing GPU-accelerated computing to using Amazon’s extensive cloud computing resources. You can create your own personal virtualized NVIDIA Volta V100 GPU-enabled system on Amazon’s cloud for as little as $3 per hour. Just upload your application’s source code, build it using the PGI compilers, and run it. This article guides you through the steps necessary to build and run an application using PGI compilers and demonstrates how GPU-accelerated computing can be cost-effective on Amazon’s cloud infrastructure.

AWS Terminology

The key to learning and effectively using Amazon Web Services’ cloud computing infrastructure is understanding Amazon’s terminology. Let’s take a few moments to define a few of these terms, used extensively throughout the remainder of this article:

  • Amazon Elastic Compute Cloud (EC2). The service that allows customers to rent virtual computers within Amazon’s cloud computing infrastructure.
  • Amazon Machine Image (AMI). A pre-configured virtual machine image containing an operating system plus the applications required for its intended purpose. AMIs based on several different operating systems can be used, including Microsoft Windows and several distributions of Linux. However, the PGI Community Edition AMI is only currently available on Ubuntu Linux .
  • Instance. A copy of an AMI running as a virtual computer. Users create instances from an AMI and customize them according to their particular needs.
  • Instance Types. Predefined configurations that specify processor(s), memory, storage, network capability and usage cost. Users choose from the various instance types made available for the particular AMI when creating an instance. See the on-demand pricing page for fees for each instance type. Note that the more powerful instance types are also more expensive.
  • Elastic Block Store (EBS). An EBS provides persistent storage for use with EC2 instances. AMIs are normally configured with an EBS volume as the root storage device containing the operating system and applications. Users can increase the size of this root volume when creating an instance. Users can also create separate EBS volumes that can be mounted to instances. Note that the use of EBS incurs additional usage charges, typically $0.10-$0.12 per GiB per month for general purpose SSD volumes.
  • Regions. AWS hosts cloud computing resources at data centers in various geographic regions worldwide. Pricing varies by region. For this article, we will be using the “Oregon” or “us-west-2” region.

Signing in to AWS

Head to the Amazon Web Services page and click on the orange box in the upper right corner of the page to get started. The text in the box will say Create an AWS Account if you don’t already have an AWS account. Click on this box and proceed through the steps as prompted to create your account. (Alternately, you can use the Create an AWS Account page.)  You will need to provide some personal information, including a credit card. Amazon prorates the hourly charges of EC2 resources by the minute, so you only pay for what you use.

Note that new accounts include “free tier” access for 12 months, though GPU-accelerated computing is not presently included in the free tier. Amazon also offers grants to subsidize usage of EC2 compute resources to students, educators, and researchers for approved projects.

Once you have set up an account, AWS saves a cookie to your computer, so you’ll see Sign In to the Console instead on future visits, as seen in figure 1.

Amazon Web Portal sign in page screenshot
Figure 1. AWS portal sign-in

Now you can sign in to the AWS console. After entering your credentials and clicking on the blue Sign In button, the AWS Console screen appears, as shown in figure 2.

AWS console page screenshot
Figure 2. AWS console main page

Creating an Instance

Select the EC2 service from the main AWS console by clicking on All Services then EC2 as shown in figure 2 above. When you log into the AWS console in the future, a link to EC2 will also appear under the list of Recently visited services.

The EC2 dashboard should look like figure 3 below.

AWS EC2 dashboard screenshot
Figure 3. AWS EC2 dashboard

Now click on the blue “Launch Instance” button to create a new instance. Let’s work through the series of steps to configure and bring online your AWS EC2 instance.

  1. Choose an Amazon Machine Image (AMI). You should now see a screen like figure 4 below, showing Step 1 at the top. Click on AWS Marketplace in the left column, type “PGI” in the search box, and select the PGI Community Edition AMI. Figure 4 outlines these steps in order.
Selecting the PGI Community Edition AMI screenshot
Figure 4. Step 1: Selecting the PGI Community Edition AMI

You’ll see a pop-up window with details about the PGI AMI next, showing available instance types and pricing for the AMI, as shown in figure 5. Review the details, including the End User License Agreement, then press the blue “Continue” button to proceed.

PGI Community Edition details screenshot
Figure 5. PGI Community Edition AMI details
  1. Choose an Instance Type. We will experiment with several different instance types during the remainder of this article. You should choose a c5.xlarge instance type for the initial experiment, which costs around $0.20 per hour, as shown in figure 6.
Choosing an instance type
Figure 6. Step 2: Choose an instance type

If the defaults for this instance look good to you, select the blue Review and Launch button here. Otherwise, select the Next: Configure Instance Details button to customize some more configuration details about the instance you are about to create.

  1. Configure Instance Details (figure 7). For now, you don’t need to change anything here. You can review these options to see the available configuration settings if you like. More advanced configurations might require tweaking some of these options. For now, just click on the Next: Add Storage button.
Configuring the instance screenshot
Figure 7. Step 3: Configure Instance Details
  1. Add Storage (figure 8). The PGI AMI includes a 20 GiB General Purpose SSD EBS volume as the root storage device. If you need more, you can easily increase the size of this volume to something larger, or alternatively add a new volume. For example, you might want to create a volume that contains applications or data that is shared among multiple EC2 instances. Click on the Next: Add Tags button to proceed to the next screen.
Add storage screenshot
Figure 8. Step 4: Add storage
  1. Add Tags (figure 9). Don’t worry about adding any tags to your EC2 instance for the time being. Click on the Next: Configure Security Group button to proceed to the next screen.
Adding tags AWS page
Figure 9. Step 5: Add Tags
  1. Configure Security Group (figure 10). A security group is a set of firewall rules that define the connections that can be made to your instance. SSH connections default to port 22 (the default SSH port) from any permitted IP address. You can restrict connections to your local IP addresses if you wish. Once you are satisfied with these settings, you should click on the Review and Launch button, as highlighted by the red box in figure 10.
Configure security group screenshot
Figure 10. Step 6: Configure Security Group
  1. Review Instance Launch (figure 11).  This screen gives you one last opportunity to review all the settings for your instance.
Review instance launch screenshot
Figure 11. Step 7: Review Instance Launch

When you click the Launch button, a window pops up enabling you to select an existing SSH key pair for authenticating to your AWS instance or create a new SSH key pair if you have not already done so, as seen in figure 12. For security purposes, all logins to AWS instances require SSH key pairs rather than sending cleartext passwords through SSH for authentication. This also allows you to access your instance from scripts without having to store an SSH password in the script.

If you need to create a new SSH key pair for logging in to AWS EC2, pull down the menu item that says Choose an existing key pair and select Create a new key pair. Give your key pair a name in the following text entry box, then click on the Download Key Pair button. The downloaded file should have a name with a.pem extension, e.g. MyKey.pem. Save this file in a safe location, because you will need it to log in to the AWS EC2 instances you create.

Configuring new SSH key pair screenshot
Figure 12. Configuring an SSH Key Pair

See the EC2 Key Pair documentation page for information on creating an SSH key pair. More information about using SSH to connect to your instance can be found in the EC2 User Guide.

Logging in to the Instance

Once you have created and launched an instance, you can view it from the EC2 Dashboard. Refer to figure 13 below for an example. Note the DNS name or IP address of the running instance. You will use this information to log in to the running instance.

Before proceeding, make sure the “Instance State” field shows “running” and the “Status Checks” field shows “2/2 checks passed” or similar. If the “Status Checks” field still shows “Initializing” the instance is not yet ready to accept connections.

AWS EC2 Instances list
Figure 13. AWS EC2 Instances

The PGI AMI includes a user account named ‘ubuntu’ which has full sudo privileges. You should create an alternate account for yourself using your preferred username.

You can now connect and login to your instance. For example, suppose your instance has been brought up on an IP address of 192.168.144.127 and you are using the private key stored in the file MyKey.pem. If you use the OpenSSH client bundled with Linux, macOS, FreeBSD, or various other operating systems, you can issue the following command to log into your instance:

$ ssh -i MyKey.pem ubuntu@192.168.144.127

If you are using the PuTTY client on Windows, you need to use the PuTTYgen tool (available as part of the complete PuTTY installation package) to convert your key to a .ppk file that PuTTY can use.

When you log in to your instance successfully, you should see a banner message and a prompt similar to the following:

* Documentation:  https://help.ubuntu.com
* Management:     https://landscape.canonical.com
* Support:        https://ubuntu.com/advantage

Get cloud support with Ubuntu Advantage Cloud Guest:

http://www.ubuntu.com/business/services/cloud

4 packages can be updated.
0 updates are security updates.
==================================

==     PGI Community Edition     ==

== with OpenACC and CUDA Fortran ==

===================================
PGI Community Edition version 18.10

Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.

Invoke the PGI Fortran, C or C++ compilers as follows:

pgcc|pgc++|pgfortran [options] <source-filename>

For more information, see the online documentation at:

https://www.pgroup.com/resources/docs/18.10/x86/

If you see a message such as *** System restart required ***, the underlying Ubuntu Linux operating system has automatically downloaded an important security update and the system needs to be rebooted in order to apply it. You should issue the following command to reboot your instance:

$ sudo shutdown -r now

Then wait a few moments, and log back in again.

Building and Running an Application on the Instance

This section guides you through building and running CloverLeaf, which is “a hydrodynamics mini-app to solve the compressible Euler equations in 2D, using an explicit, second-order method.” Obtain the CloverLeaf source code by issuing the following command:

$ git clone --recurse-submodules https://github.com/UK-MAC/CloverLeaf.git

Once this command finishes, you should see a new directory named CloverLeaf.

Serial Version

First, log into the c5.xlarge instance you created above and build the serial version of CloverLeaf. This version runs on only one CPU core and serves to provide a baseline time for performance of the application:

$ cd CloverLeaf
$ make serial COMPILER=PGI

When the build completes, an executable named clover_leaf appears in the CloverLeaf_Serial subdirectory. Try running CloverLeaf using the clover_bm32.in input deck. CloverLeaf expects its input file to be named clover.in. First move the existing clover.in file out of the way and then copy the clover_bm32.in file from the InputDecks subdirectory to the current directory as clover.in:

$ cd CloverLeaf_Serial
$ mv clover.in clover.in.bak
$ cp InputDecks/clover_bm32.in clover.in

Now, you should be able to run the serial version of CloverLeaf as follows:

$ ./clover_leaf

The serial version takes about 3.5 hours to complete. You should see a series of time steps printed to the screen, culminating in the final step:

Average time per cell    1.5531565209357159E-007
Step time per cell       1.6049411594091604E-007
Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
Wall clock     13535.59463596344

The total cost of running the serial version of CloverLeaf on the c5.xlarge instance is roughly $0.17 per hour x 3.760 hours = $0.64.

Changing the Instance Type

You need to change the instance type for subsequent experiments. You do this by stopping your instance using the steps in Shutting Down Your Instance at the end of this article[link].

Change the instance type by navigating to the Instances page of the EC2 Dashboard as shown above in figure 13. Select the instance, click the Actions button at the top or right-click on the instance, choose Instance Settings > Change Instance Type, and choose from the drop-down in the Change Instance Type pop-up, as you can see in figure 14. For the next experiment, let’s use the c5.9xlarge instance type.

Changing instance type screenshot
Figure 14. Changing the instance type

To restart the instance, either click Actions or right-click on the instance. Choose Instance State > Start.

Building and Running Parallel Applications on the Instance

OpenMP Parallel Version

Next try building the OpenMP version of CloverLeaf. For this experiment, stop the instance and change the instance type to c5.9xlarge as described above. This instance type provides 36 virtualized CPU cores at a rate of $1.53 per hour and should deliver a substantial speedup over the serial version.

Start the instance and log into it as before. Then change into the top-level CloverLeaf directory and issue the following command:

$ make openmp COMPILER=PGI

Once the build completes, change to the CloverLeaf_OpenMP subdirectory and copy the same clover_bm32.in input file as described previously:

$ cd CloverLeaf_OpenMP
$ mv clover.in clover.in.bak
$ cp InputDecks/clover_bm32.in clover.in

You can now run the OpenMP version of CloverLeaf as follows:

$ OMP_NUM_THREADS=36 ./clover_leaf

The OpenMP version requires approximately 30 minutes to complete:

Average time per cell    2.4922225906677995E-008
Step time per cell       2.4786172111311719E-008
Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
Wall clock     2171.918267965317

The total cost of running the OpenMP version of CloverLeaf on the c5.9xlarge instance is roughly $1.53 per hour x 0.603 hours = $0.92.

As you can see, the OpenMP version is a big win over the serial version, requiring less time to complete at roughly the same cost.

MPI Parallel Version

You can also build a parallel version of CloverLeaf using MPI. The PGI AMI includes a build of Open MPI bundled with the PGI compilers. For this experiment, continue using the c5.9xlarge instance type from the previous section. Change back to the top-level CloverLeaf directory and issue the following command:

$ make mpi COMPILER=PGI

Once the build completes, you should change to the CloverLeaf_MPI subdirectory, and copy the same clover_bm32.in input file as described previously:

$ cd CloverLeaf_MPI
$ mv clover.in clover.in.bak
$ cp InputDecks/clover_bm32.in clover.in

You can now run the MPI version of CloverLeaf as follows:

$ mpirun -np 36 ./clover_leaf

The MPI version of CloverLeaf seems to be a bit faster than the OpenMP version:

Average time per cell    2.4026006379229807E-008
Step time per cell       2.3560820005109740E-008
Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
Wall clock     2093.818086147308

The total cost of running the MPI version of CloverLeaf on the c5.9xlarge instance in the Oregon region is roughly $1.53 per hour x 0.582 hours = $0.89.

The MPI version seems slightly faster than the OpenMP version at a slightly lower cost.

OpenACC Parallel Multicore Version

You can also build a parallel version of CloverLeaf that runs on multiple host CPU cores using OpenACC directives. This is potentially useful for testing applications with OpenACC when a GPU is not available on the system or as a first step toward porting a given application to run on GPUs.

Before you can build the OpenACC version of CloverLeaf, you need to issue a couple of commands to fix a couple of minor build issues with this version. Change back to the top-level CloverLeaf directory and issue the following commands:

Now invoke the build as follows:

$ make openacc_kernels COMPILER=PGI

Once the build completes, you should change to the CloverLeaf_OpenACC subdirectory, and copy the same clover_bm32.in input file as described previously:

$ cd CloverLeaf_OpenACC
$ mv clover.in clover.in.bak
$ cp InputDecks/clover_bm32.in clover.in

You can now run the OpenACC version of CloverLeaf on multiple host CPU cores as follows:

$ mpirun -np 1 ./clover_leaf

CloverLeaf should complete in roughly the same amount of time as the MPI version in the previous section:

Average time per cell    2.2886566252314620E-008
Step time per cell       2.2746551419711775E-008
Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
Wall clock     1994.515158891678

The total cost of running the OpenACC Multicore version of CloverLeaf on the c5.9xlarge instance in the Oregon region is roughly $1.53 per hour x 0.554 hours = $0.85.

Running the OpenACC Multicore version of CloverLeaf is slightly cheaper than running the MPI version on the same hardware in this particular case.

OpenACC Parallel Version on 1 GPU

Stop your instance so we can prepare an NVIDIA Volta V100 GPU to accelerate CloverLeaf via OpenACC. AWS doesn’t provide access to GPU-enabled instance types by default so users must first request access. Check the EC2 Service Limits page to see if you have access to p3.2xlarge and p3.8xlarge instance types. If not, submit a request to AWS via the “Request limit increase” link.

Once you have access to p3 instance types, change the instance type to p3.2xlarge and start the instance. The p3.2xlarge instance type provides eight virtualized CPU cores and one virtualized V100 GPU, which should provide a substantial speedup over the serial version. Amazon charges a higher rate for it accordingly: $3.06 per hour.

We are going to use a slightly different version of CloverLeaf for the next couple of experiments. This version has been modified to better support running CloverLeaf on multiple GPUs on a single system. To obtain this version of CloverLeaf, issue the following command:

$ git clone https://github.com/UoB-HPC/CloverLeaf-OpenACC

Once again, we need to fix up a few things in the Makefile:

$ sed -i -e 's#-ta=tesla,cc60#-ta=nvidia,cc35,cc60,cc70 -DUSE_CUDA_AWARE_MPI#g'CloverLeaf_OpenACC/Makefile
Now invoke the build as follows:
$ cd CloverLeaf-OpenACC
$ make COMPILER=PGI

Once the build completes, copy the same clover_bm32.in input file as described previously:

$ mv clover.in clover.in.bak
$ cp InputDecks/clover_bm32.in clover.in

You can now run the OpenACC version of CloverLeaf on a single V100 GPU as follows:

A single V100 completes an entire run of this application in just over three minutes:

Average time per cell    2.1036212802534272E-009
Step time per cell       2.0812310847557253E-009
Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
Wall clock     183.3258261680603

Even more impressively, the total cost of running the OpenACC version of CloverLeaf on the p3.2xlarge instance is $3.04 per hour x 0.051 hours = $0.15.

GPU-accelerated computing both saves a lot of time when running an application and saves a lot of money as well.

OpenACC Parallel Version on 4 GPUs

Now let’s try a really fun experiment to fully showcase the power of GPU-accelerated computing by harnessing the power of multiple GPUs in parallel to run the same CloverLeaf problem.

For this experiment, you will use four V100s GPU to accelerate CloverLeaf via OpenACC. Bring down your p3.2xlarge instance and change the instance type to p3.8xlarge. This instance type provides 32 virtualized CPU cores and four virtualized V100 GPUs. As this is one of the most powerful instance types AWS EC2 offers, its cost is reflected accordingly: Amazon charges around $12 per hour to use a p3.8xlarge instance. Fortunately, you will not be using this one for very long.

We reuse the same GPU-enhanced version of the CloverLeaf source code as in the earlier section, so you don’t need to download or rebuild it here. Simply change to the CloverLeaf_OpenACC directory and run CloverLeaf as follows:

$ cd CloverLeaf-OpenACC
$ mpirun -np 4 ./clover_leaf

Notice that four GPUs whiz through this CloverLeaf problem in about a minute:

Average time per cell    5.6823226477422412E-010
Step time per cell       5.6024065189477467E-010
Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
Wall clock     49.52016711235046

Not surprisingly, we get a nearly 4x speed-up over the single-GPU experiment. This improved performance mostly makes up for the more expensive multi-GPU instance type, as the cost is roughly the same: $12.24 per hour x 0.0138 hours = $0.17.

Multiple-GPU instance types can be very cost-effective, especially when running larger, time-consuming parallel-capable applications.

Results Summary

Below is a table summarizing all of our results.

Version Time (secs.) Cost
1 Skylake Core 13536 $0.64
36 Skylake Cores (OpenMP) 2171 $0.92
36 Skylake Cores (MPI) 2094 $0.89
36 Skylake Cores (OpenACC) 1995 $0.85
1 V100 GPU (OpenACC) 183 $0.15
4 V100 GPUs (OpenACC) 50 $0.17

Shutting Down the Instance

One important item that bears repeating is that your instance continues to accrue charges as long as it is running. You should shut down (“stop”) your instances whenever you are not using them to avoid unnecessary fees. To do this, bring up the EC2 Dashboard, find the running instance you need to shut down in the list of instances, right click on it, and select Instance State followed by Stop as illustrated in figure 15.

Stopping an AWS instance figure
Figure 15. Stopping a Running Instance

This can take a few minutes, so verify that it has stopped before closing your browser. Figure 15 shows the EC2 Dashboard when the instance has reached the Stopped state.

Stopping instance state image
Figure 16. Stopped Instance State

IMPORTANT NOTE: The default Terminate action means the instance will be removed and its associated root storage (EBS volume) will be deleted. Do not change your instance state to Terminate unless you are finished with your instance and wish to delete it.

Conclusion

Using the PGI AMI on AWS, you can access GPU-accelerated computing for very little investment. Using Amazon’s EC2 cloud computing platform, we sped up a sample application from running in three and one-half hours on a single-core CPU, to just under a minute using four state-of-the-art NVIDIA Volta V100 GPUs. At the same time, accelerating the application with the GPU resulted in a significant cost savings.

Chris Parrott is a DevTech Software Engineer working with the PGI compilers and tools product group.  In his current role, he wears a lot of different hats, including applications enablement, software testing, performance benchmarking and analysis, and release engineering of third-party application libraries.  He can be reached at cparrott@nvidia.com.


No Comments