European bank: Big data solution migrated from SAS to open source retaining ≈100% accuracy

SENLA supports a major bank’s transition from SAS to open-source for more scalability and fewer costs.

Location:

Europe

Employees:

200.000+

Industry:

Banking & Finance

Customers:

100+ million

The Challenge

A major bank's big data was stored and processed on one dedicated server, with little scalability. The bank wanted to store the data on-premise for safety reasons and process it more cost-effectively.

The Solution

SENLA helped migrate the credit risk calculation solution to an open-source on-premise distributed data storage platform.

The Value

The new solution works with ≈100% calculations convergence, provides increased scalability and performance, and allows for better data safety and significant financial savings.

What’s so big about big data?

Big data is the boss of all data, the massive flood of information, both organized and chaotic, that bombards companies every day. Examples include customer databases, medical records, search queries, transaction records, trading data, market feeds, engagement metrics, purchase histories, and more.

It's a valuable source of invaluable information about your business and the direction in which it is (and should be) heading, ready to be monetized. But only if you’re prepared to tap into this superpower along with the 97% of the world’s mainstream organizations who are increasingly investing in the quality, accuracy, and completeness of their data, as per the 2022 executive survey from NewVantage Partners.

The same report shows that over 92.1% of these businesses already acknowledge that big data efforts are yielding measurable business outcomes, up from just 48.4% in 2017, and 70.3% in 2020 to 96% in 2021.

Need a big solution for your big data project?

Our 800+ top-class experts are up for the job

Tailor your perfect solution

The risks of banking

One of the multiple uses of big data technologies is analyzing large datasets for informed decision-making, as well as implementing policies and procedures to ensure data quality, security, and regulatory compliance.

In Banking & Finance, one example of such use is calculating and identifying credit risks.

Most countries require by law that banks hold extra capital to protect against unforeseen financial emergencies. Banks granting riskier loans must maintain higher funds than those with safer loans. To ensure that financial institutions measure the credit risk of their loans correctly, they must comply with a set of international standards outlined by the Basel Committee on Banking Supervision. The current ruleset is called Basel III, and Basel IV is in the works to introduce more stringent regulations.

To stay compliant with Basel regulations, some years ago our Client, a leading European financial institution, developed and successfully implemented a solution for measuring credit risks on the basis of the SAS^® Regulatory Risk Management (SAS RRM) platform.

SAS RRM was a cloud-based analytics environment with a centralized model of data processing. It allowed for effectively identifying, evaluating, and managing various risks under all Basel standards, providing decision support and business value. For several years, it was the perfect solution.

The call for change

Yet time refuses to stand still. With expanding loan portfolios, ever-tightening productivity demands, and the introduction of new technologies, our Client began considering optimizing costs and modernizing their SAS-based solution.

The problem with SAS was that all the organization's data was stored and processed on one dedicated server. In order to scale the process and increase productivity, our Client would need to buy more costly resources on the server.

This solution was quickly deemed economically impractical because while risk calculation was a highly crucial procedure, it still wasn’t running 24/7. If our Client purchased more server capacity, some of it would often be wasted. And it would cost too much to keep the resources idle when they weren’t used. Something more scalable and cost-efficient was called for.

Besides, the Client decided to opt for an on-premises data storage solution as compared to the cloud-based one, aiming for the following advantages:

Increased security. Reaching an all-time high, the cost of a data breach averaged $4.35 million in 2022 as per the IBM Security report, while approximately 15 million data records were exposed worldwide. Self-hosted storage would make it more difficult for remote hackers to gain access to the data.
Offline usage. Non-cloud data warehouses can be used without connecting to the internet, so there wouldn’t be a situation when the systems were offline due to an unstable connection.
Control over the resources. With in-house platforms, the organization would have full control and total ownership over the data and all resources in use.

At the same time, it was important for the Client that the new solution worked within the existing ecosystem with the same functionality as SAS with no loss of productivity.

Solution: let’s go open-source

The answer to all the questions came quickly: go open-source!

The Apache Hadoop platform was chosen as the data storage platform while Apache Spark was the intended framework for operating it. Java was used as the preferred programming language because of the team’s strong expertise and better API support on the Spark side.

Hadoop is an open-source distributed storage platform which means that it’s free and scales horizontally by connecting more computing nodes when higher performance is required.

Apache Spark is a fast data processing framework that can quickly perform tasks on very large data sets (up to petabytes), and can also distribute processing tasks across multiple computers for enhanced scalability and less wasted capacities. It utilizes in-memory caching and optimized query execution for fast analytic queries against data of any size.

Both were perfect for the Client’s business needs. All that was missing now was a strong team of experts capable of performing the technically-challenging task of transitioning a complex big data solution from SAS to Hadoop.

The difficulty lies in the distinctions between SAS and open-source technologies. SAS is a unique environment, with its own architecture, syntax, macros, data processing principles, etc. When migrating a solution from SAS to another big data platform, extensive changes in architecture and system logic are required.

The team

When SENLA was approached with the task, we were honored and excited. It was the first big data project of this scale we were going to undertake.

Besides, this was a perfect moment to implement one of the pillars of SENLA’s philosophy: taking care of our experts at all times while providing them with constant opportunities to grow and develop their skills.

“The possibility to learn the SAS environment was the most captivating aspect of the project. Well, that and figuring out how to implement the unique macros-based features of SAS through the possibilities of Spark.”

Marharyta Ihnatsyeva, SENLA Data Engineer

Our experts enthusiastically delved into learning the SAS environment at the SAS Academy, driven by a desire to gain a profound understanding of the challenges at hand. After completing the courses, they were quickly integrated into the work on the project.

Testing, testing, testing

SENLA’s work on the solution lasted for 8 months before the release. Each day, the team was in close collaboration with the bank’s experts to ensure utmost compliance with all requirements.

It was crucial that the performance of the new solution remained the same as that of the previous SAS-based system. To accomplish this, our experts conducted multiple testing sessions of each module and the final testing of the whole system. They relied on the data from the Client’s side and employed various testing techniques.

“After receiving an example of data from the bank’s analysts, we first launched the data processing on the SAS solution and took those results as the standard. Then the process was repeated on the Spark-based system. For this, we developed a custom application that ran this process. By comparing both outcomes we finally gained the same level of accuracy as SAS, and sometimes even higher.”

Yauheni Belanovich, SENLA Data Engineer

The most challenging part of the project was to bring together all the modules of the system before the release due to its complexity and depth. The performance fine-tuning, final testing, and code refactoring took another 2 months.

The value: ≈100% calculations convergence

“SENLA’s experts took this project very earnestly, looking at it far wider than the outlined tasks required. They even managed to find errors in the original SAS solution that had already been used for 4 years without anyone noticing!”

Client’s Representative

The Client’s big data solution was effectively migrated from SAS RRM to the open-source platform Hadoop operated by Spark in under 1 year and without disrupting the business.

The project's primary outcome was proving the feasibility of transferring a massive critically important process to open-source while fulfilling all the Client’s requirements. Other accomplishments include:

The solution works with ≈100% calculations convergence, up to thousandths after the decimal, which is a tremendous result.
The horizontally distributed nature of Hadoop made scalability a matter of simply engaging more data nodes when and if required, without wasting capacity.
Spark accounted for the increased performance of calculations compared to the old implementation, all without license fees and reliance on proprietary software.
All security concerns were fully covered with the distributed on-premises solution allowing for better control and data safety.
Migration from SAS to open-source achieved significant financial savings for the Client.

And SENLA’s experts gained invaluable expertise in big data solutions, ready to be implemented in projects of any complexity. Maybe yours?

Why Senla?

Big data gurus

We understand the great importance of big data solutions and make them one of our priority services. Our Data Engineers take relevant courses and constantly adapt to the latest trends in technology.

Successful big data projects

Apart from one of the best-in-class theoretical bases, our experts often apply it in the field, having accomplished numerous profitable projects.

Direct communication

You, your tech lead, or your project manager communicate with the dedicated development team directly. No middlemen, no miscommunication.

Frequently Asked Questions

How soon can we start working?

Our experts can get to work in as little as 7–10 business days after the introductory call, depending on the engagement model which you choose.

I've already started development with a different vendor but I’m not happy with the results. Can you help?

If you aren't satisfied with your existing partnership, you can transition the work to us. We'll take the ball and run with it.

Can you help me with my data management after the project is finished?

Sure! We strive to form partnerships, not projects. Our cooperation, based on a lifetime warranty, doesn’t stop with the development, we provide continuous support for our solutions.

Request an offer

More than eight hundred technical experts are ready to work

Contact info

Full name *

Work phone *

Company *

Work email *

My projects and goals:

Help me with:optional

BA/SA

UX/UI

Architects

Scrum/PM

AQA

Development

Frontend

Backend

BigData

Data Science

Salesforce

Hybris

Mobile

DataBase

Machine Learning

Artificial Intelligence

DevOps

Node.js

European bank: Big data solution migrated from SAS to open source retaining ≈100% accuracy