ALEF powers complex HPC financial simulations 30% faster with OCI
October 11, 2023 | 8 minute read
Authored by Kellsey Ruppel, principal product marketing director at Oracle.
Figure 1: ALEF offers solutions to the problems of financial management and action on capital markets.
With roots in academia, the Advanced Laboratory for Economics and Finance (ALEF) develops software that helps insurance companies, banks, government agencies, and other customers with some of their biggest financial challenges. These challenges include managing investment portfolios, mitigating risk, valuing assets, and assessing contracts. ALEF’s software’s complex financial simulations often require massive amounts of data and high-performance computing resources.
Goals for cloud migration
Before ALEF adopted several Oracle Cloud solutions, customers ran ALEF’s software on their own on-premises IT infrastructure. They were responsible for managing and securing that infrastructure and complying with security and other regulations. But many customers didn’t have the resources to keep up.
ALEF decided it needed a higher-performing and more cost-effective, scalable, and secure cloud-based solution.
Why ALEF chose Oracle
After testing cloud infrastructure services from Oracle, Amazon Web Services (AWS), and Microsoft, ALEF chose Oracle Cloud Infrastructure (OCI) to handle its data-intensive workloads. OCI checked all its boxes: superior price-performance, scalability, and security.
“We tried other clouds before discovering Oracle Cloud Infrastructure,” said Pietro Lascari, delivery manager at ALEF, “But for our specific applications and use cases, OCI proved to be the best enterprise-grade solution. The scalability, flexibility, and security were key to our success.”
Suite of Oracle products used
OCI includes all the services needed to migrate, build, and run IT in the cloud, from existing enterprise workloads to new cloud native applications and data platforms. ALEF used the following OCI services and technologies:
- Region: An OCI region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).
- Availability domain: Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.
- Virtual cloud network (VCN) and subnets: A VCN is a customizable, software-defined network that you set up in an OCI region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple nonoverlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.
- VPN Connect: VPN Connect provides site-to-site IPSec VPN connectivity between your on-premises network and VCNs in OCI. The IPSec protocol suite encrypts IP traffic before the packets are transferred from the source to the destination and decrypts the traffic when it arrives.
- Network address translation (NAT) gateway: A NAT gateway enables private resources in a VCN to access hosts on the internet, without exposing those resources to incoming internet connections.
- Dynamic routing gateway (DRG): The DRG is a virtual router that provides a path for private network traffic between VCNs in the same region, between a VCN and a network outside the region, such as a VCN in another OCI region, an on-premises network, or a network in another cloud provider.
- Remote peering: Remote peering allows the VCNs' resources to communicate using private IP addresses without routing the traffic over the internet or through your on-premises network. Remote peering eliminates the need for an internet gateway and public IP addresses for the instances that need to communicate with another VCN in a different region.
- Compute: The OCI Compute service enables you to provision and manage Compute hosts in the cloud. You can launch Compute instances with shapes that meet your resource requirements for CPU, memory, network bandwidth, and storage. After creating a compute instance, you can access it securely, restart it, attach and detach volumes, and terminate it when you no longer need it.
- File Storage: The OCI File Storage service provides a durable, scalable, secure, enterprise-grade network file system. You can connect to a File Storage service file system from any bare metal, virtual machine, or container instance in a VCN. You can also access a file system from outside the VCN by using OCI FastConnect and IPSec VPN.
- Cloud Guard: You can use Oracle Cloud Guard to monitor and maintain the security of your resources in OCI. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define.
- Autonomous Database: OCI autonomous databases are fully managed, preconfigured database environments that you can use for transaction processing and data warehousing workloads. You don’t need to configure or manage any hardware or install any software. OCI handles creating the database and backing up, patching, upgrading, and tuning the database.
- Virtual Machine (VM) Database System: Oracle VM Database System is an OCI database service that enables you to build, scale, and manage full-featured Oracle databases on virtual machines. A VM database system uses OCI Block Volumes storage instead of local storage and can run Oracle Real Application Clusters (RAC) to improve availability.
- Exadata Database System: Exadata Cloud Service enables you to leverage the power of Exadata in the cloud. You can provision flexible X8M systems that allow you to add database compute servers and storage servers to your system as your needs grow. X8M systems offer RDMA over Converged Ethernet (RoCE) networking for high bandwidth, low latency, persistent memory (PMEM) modules, and intelligent Exadata software. You can provision X8M systems by using a shape that's equivalent to a quarter-rack X8 system, and then add database and storage servers at any time after provisioning.
Migration path
The architecture for delivering ALEF’s insurance data systems (IDS) software follows a traditional client-server architecture.
Users access and interact with the client machines in a frontend subnet. The access is facilitated by remote desktop protocol (RDP). Client machines help users to provide input data, and to prepare and submit workloads. The backend is composed of high-performance computing (HPC) nodes designed for running large batch jobs, such as risk analysis, simulations, and other computationally intense workloads. The insurance data system (IDS) software relies on OCI Database for data persistence.
Figure 2: The architecture for ALEF’s solution.
Considerations
ALEF considered the following points when deploying this architecture:
- Performance: ALEF use both bare metal and VM Compute instances to run their workload. For HPC workloads, bare metal Compute shapes are used to provide higher bandwidth, large memory, NVMe storage, and complete isolation. The shapes of the Compute instances are decided based on the complexity and intensity of the workload and computing power required.
These Compute instances use both Oracle Linux and Windows Operating systems. ALEF also uses custom Linux-based images.
ALEF use instance pools to autoscale their Compute nodes whenever the computing power needs dynamic management, based on user needs. - Security: ALEF were early users of Oracle Cloud Guard. ALEF's data consists of highly sensitive financial and insurance data, so they chose to secure their data using simple but powerful security features in Cloud Guard. ALEF has ensured that none of their resources lie in public subnets. They deployed all the resources in private subnets and provisioned NAT gateways to access the public internet. IPSec VPN allows their users to communicate from the on-premises to OCI through a dynamic routing gateway (DRG).
- Availability: ALEF follows OCI high-availability best practices by spreading the compute nodes across different availability and fault domains. ALEF uses remote peering between the two regions to provide maximum availability when disaster recovery occurs.
- Scalability: OCI File Storage offers high scalability especially when computing parallel workloads. ALEF uses OCI File Storage connected to Oracle Database systems to store the Oracle archive redo logs. OCI File Storage also exposes the shared NFS mount point to HPC cluster nodes.
ALEF uses the backup feature of the OCI Block Volume service to store backups using their own custom defined policies in home region (eu-frankfurt-1) and secondary region (uk-london-1).
Results
ALEF implemented several solutions, all running on OCI. First, Oracle HPC gives ALEF the Compute resources of the most robust on-premises solutions with the elasticity and consumption-based pricing of the cloud. When ALEF benchmarked its HPC workloads on competing cloud infrastructure providers, it found that its workloads run 30% faster on OCI.
One HPC application that ALEF offers to insurance companies lets them complete complicated Monte Carlo simulations to predict the probability of different outcomes when random variables intervene for risk analysis faster and at lower cost. Thanks to the ease of use and availability of HPC resources on OCI, ALEF can deploy a range of other applications for customers in just a few weeks.
The laboratory calculated a 70% performance improvement for some applications on OCI compared with when they ran on-premises. Using its G-Consol application, some processes that once took more than 30 hours to complete now take just a few hours. ALEF's grid computing applications, including Disar benefit from what OCI offers in terms of modern, state-of-the-art engineered hardware and services.
With ALEF’s managed services model, customers are benefitting from the additional security capabilities of OCI, which are critical given the company’s handling of sensitive financial data. Pietro Lascari, ALEF delivery manager, says he likes OCI’s “security by default” approach and its easy-to-use tools “that help us keep the security posture we need for us and our customers.”
To help customers avoid breaches and comply with regulations, the laboratory uses Oracle Cloud Guard to identify existing and potential security threats.
Next steps
In the future, ALEF plans to implement Oracle machine learning to run their big data processing jobs with Hadoop and Spark. They're also looking to include its risk analysis software-as-a-service (SaaS) offering on Oracle Cloud Marketplace.
For more information on ALEF and Oracle Cloud Infrastructure, see the following resources: