Publish and enrich real-time monetary information feeds utilizing Amazon MSK and Amazon Managed Service for Apache Flink

September 10, 2024

32

Monetary information feeds are real-time streams of inventory quotes, commodity costs, choices trades, or different real-time monetary information. Corporations concerned with capital markets akin to hedge funds, funding banks, and brokerages use these feeds to tell funding choices.

Monetary information feed suppliers are more and more being requested by their prospects to ship the feed on to them by way of the AWS Cloud. That’s as a result of their prospects have already got infrastructure on AWS to retailer and course of the info and wish to devour it with minimal effort and latency. As well as, the AWS Cloud’s cost-effectiveness allows even small and mid-size firms to turn out to be monetary information suppliers. They will ship and monetize information feeds that they’ve enriched with their very own worthwhile data.

An enriched information feed can mix information from a number of sources, together with monetary information feeds, so as to add data akin to inventory splits, company mergers, quantity alerts, and shifting common crossovers to a primary feed.

On this submit, we reveal how one can publish an enriched real-time information feed on AWS utilizing Amazon Managed Streaming for Kafka (Amazon MSK) and Amazon Managed Service for Apache Flink. You possibly can apply this structure sample to numerous use circumstances inside the capital markets trade; we focus on a few of these use circumstances on this submit.

Apache Kafka is a high-throughput, low-latency distributed occasion streaming platform. Monetary exchanges akin to Nasdaq and NYSE are more and more turning to Kafka to ship their information feeds due to its distinctive capabilities in dealing with high-volume, high-velocity information streams.

Amazon MSK is a totally managed service that makes it straightforward so that you can construct and run functions on AWS that use Kafka to course of streaming information.

Apache Flink is an opensource distributed processing engine, providing highly effective programming interfaces for each stream and batch processing, with first-class assist for stateful processing, occasion time semantics, checkpointing, snapshots and rollback. Apache Flink helps a number of programming languages, Java, Python, Scala, SQL, and a number of APIs with totally different stage of abstraction, which can be utilized interchangeably in the identical software.

Amazon Managed Service for Apache Flink is a totally managed, serverless expertise in operating Apache Flink functions. Prospects can simply construct actual time Flink functions utilizing any of Flink’s languages and APIs.

On this submit, we use a real-time inventory quotes feed from monetary information supplier Alpaca and add an indicator when the worth strikes above or beneath a sure threshold. The code offered within the GitHub repo means that you can deploy the answer to your AWS account. This resolution was constructed by AWS Accomplice NETSOL Applied sciences.

Answer overview

On this resolution, we deploy an Apache Flink software that enriches the uncooked information feed, an MSK cluster that comprises the messages streams for each the uncooked and enriched feeds, and an Amazon OpenSearch Service cluster that acts as a persistent information retailer for querying the info. In a separate digital personal cloud (VPC) that acts because the buyer’s VPC, we additionally deploy an Amazon EC2 occasion operating a Kafka consumer that consumes the enriched information feed. The next diagram illustrates this structure.

Solution Architecture
Determine 1 – Answer structure

The next is a step-by-step breakdown of the answer:

The EC2 occasion in your VPC is operating a Python software that fetches inventory quotes out of your information supplier by way of an API. On this case, we use Alpaca’s API.
The applying sends these quotes utilizing Kafka consumer library to your kafka subject on MSK cluster. The kafka subject shops the uncooked quotes.
The Apache Flink software takes the Kafka message stream and enriches it by including an indicator each time the inventory value rises or declines 5% or extra from the earlier enterprise day’s closing value.
The Apache Flink software then sends the enriched information to a separate Kafka subject in your MSK cluster.
The Apache Flink software additionally sends the enriched information stream to Amazon OpenSearch utilizing a Flink connector for OpenSearch. Amazon Opensearch shops the info, and OpenSearch Dashboards permits functions to question the info at any level sooner or later.
Your buyer is operating a Kafka client software on an EC2 occasion in a separate VPC in their very own AWS account. This software makes use of AWS PrivateLink to devour the enriched information feed securely, in actual time.
All Kafka person names and passwords are encrypted and saved in AWS Secrets and techniques Supervisor. The SASL/SCRAM authentication protocol used right here makes certain all information to and from the MSK cluster is encrypted in transit. Amazon MSK encrypts all information at relaxation within the MSK cluster by default.

The deployment course of consists of the next high-level steps:

Launch the Amazon MSK cluster, Apache Flink software, Amazon OpenSearch Service area, and Kafka producer EC2 occasion within the producer AWS account. This step normally completes inside 45 minutes.
Arrange multi-VPC connectivity and SASL/SCRAM authentication for the MSK cluster. This step can take as much as half-hour.
Launch the VPC and Kafka client EC2 occasion within the client account. This step takes about 10 minutes.

Conditions

To deploy this resolution, full the next prerequisite steps:

Create an AWS account should you don’t have already got one and log in. We check with this because the producer account.
Create an AWS Identification and Entry Administration (IAM) person with full admin permissions. For directions, check with Create an IAM person.
Signal out and signal again in to the AWS Administration Console as this IAM admin person.
Create an EC2 key pair named my-ec2-keypair within the producer account. If you have already got an EC2 key pair, you possibly can skip this step.
Observe the directions in ALPACA_README to join a free Primary account at Alpaca to get your Alpaca API key and secret key. Alpaca will present the real-time inventory quotes for our enter information feed.
Set up the AWS Command Line Interface (AWS CLI) in your native growth machine and create a profile for the admin person. For directions, see Arrange the AWS Command Line Interface (AWS CLI).
Set up the newest model of the AWS Cloud Growth Equipment (AWS CDK) globally:

 npm set up -g aws-cdk@newest

Deploy the Amazon MSK cluster

These steps create a brand new supplier VPC and launch the Amazon MSK cluster there. You additionally deploy the Apache Flink software and launch a brand new EC2 occasion to run the appliance that fetches the uncooked inventory quotes.

In your growth machine, clone the GitHub repo and set up the Python packages:

git clone https://github.com/aws-samples/msk-powered-financial-data-feed.git
cd msk-powered-financial-data-feed
pip set up -r necessities.txt

Set the next atmosphere variables to specify your producer AWS account quantity and AWS Area:
```
export CDK_DEFAULT_ACCOUNT={your_AWS_account_no}
export CDK_DEFAULT_REGION=us-east-1
```

Run the next instructions to create your config.py file:

echo "mskCrossAccountId = " > config.py
echo "producerEc2KeyPairName="" " >> config.py
echo "consumerEc2KeyPairName="" " >> config.py
echo "mskConsumerPwdParamStoreValue="" " >> config.py
echo "mskClusterArn = '' " >> config.py

Run the next instructions to create your alpaca.conf file:

echo [alpaca] > dataFeedMsk/alpaca.conf
echo ALPACA_API_KEY=your_api_key >> dataFeedMsk/alpaca.conf
echo ALPACA_SECRET_KEY=your_secret_key >> dataFeedMsk/alpaca.conf

Edit the alpaca.conf file and change your_api_key and your_secret_key along with your Alpaca API key.

Bootstrap the atmosphere for the producer account:

cdk bootstrap aws://{your_AWS_account_no}/{your_aws_region}

Utilizing your editor or built-in growth atmosphere (IDE), edit the config.py file:
1. Replace the mskCrossAccountId parameter along with your AWS producer account quantity.
2. When you’ve got an present EC2 key pair, replace the producerEc2KeyPairName parameter with the title of your key pair.
View the dataFeedMsk/parameters.py file:
1. In case you are deploying in a Area apart from us-east-1, replace the Availability Zone IDs az1 and az2 accordingly. For instance, the Availability Zones for us-west-2 would us-west-2a and us-west-2b.
2. Be sure that the enableSaslScramClientAuth, enableClusterConfig, and enableClusterPolicy parameters within the parameters.py file are set to False.
Ensure you are within the listing the place the app1.py file is positioned. Then deploy as follows:
```
cdk deploy --all --app "python app1.py" --profile {your_profile_name}
```
Test that you just now have an Amazon Easy Storage Service (Amazon S3) bucket whose title begins with awsblog-dev-artifacts containing a folder with some Python scripts and the Apache Flink software JAR file.

Deploy multi-VPC connectivity and SASL/SCRAM

Full the next steps to deploy multi-VPC connectivity and SASL/SCRAM authentication for the MSK cluster:

Set the enableSaslScramClientAuth, enableClusterConfig, and enableClusterPolicy parameters within the config.py file to True.
Ensure you’re within the listing the place the config.py file is positioned and deploy the multi-VPC connectivity and SASL/SCRAM authentication for the MSK cluster:

cdk deploy --all --app "python app1.py" --profile {your_profile_name}

This step can take as much as half-hour.

To examine the outcomes, navigate to your MSK cluster on the Amazon MSK console, and select the Properties

You must see PrivateLink turned on, and SASL/SCRAM because the authentication sort.

BDB-3696-multiVPC

Copy the MSK cluster ARN.
Edit your config.py file and enter the ARN as the worth for the mskClusterArn parameter, then save the up to date file.

Deploy the info feed client

Full the steps on this part to create an EC2 occasion in a brand new client account to run the Kafka client software. The applying will hook up with the MSK cluster by way of PrivateLink and SASL/SCRAM.

Navigate to Parameter Retailer, a functionality of AWS Techniques Supervisor, in your producer account.
Copy the worth of the blogAws-dev-mskConsumerPwd-ssmParamStore parameter and replace the mskConsumerPwdParamStoreValue parameter within the config.py file.
Test the worth of the parameter named blogAws-dev-getAzIdsParamStore and make an observation of those two values.
Create one other AWS account for the Kafka client should you don’t have already got one, and log in.
Create an IAM person with admin permissions.
Sign off and log again in to the console utilizing this IAM admin person.
Ensure you are in the identical Area because the Area you used within the producer account. Then create a brand new EC2 key pair named, for instance, my-ec2-consumer-keypair, on this client account.
Replace the worth of consumerEc2KeyPairName in your config.py file with the title of the important thing pair you simply created.
Open the AWS Useful resource Entry Supervisor (AWS RAM) console in your client account.
Evaluate the Availability Zone IDs from the Techniques Supervisor parameter retailer with the Availability Zone IDs proven on the AWS RAM console.
Determine the corresponding Availability Zone names for the matching Availability Zone IDs.
Open the parameters.py file within the dataFeedMsk folder and insert these Availability Zone names into the variables crossAccountAz1 and crossAccountAz2. For instance, in Parameter Retailer, if the values are “use1-az4” and “use1-az6”, then, if you change to the buyer account’s AWS RAM console and evaluate, you could discover that these values correspond to the Availability Zone names “us-east-1a” and “us-east-1b”. In that case, it is advisable replace the parameters.py file with these Availability Zone names by setting crossAccountAz1 to “us-east-1a” and crossAccountAz2 to “us-east-1b”.
Set the next atmosphere variables, specifying your client AWS account ID:

export CDK_DEFAULT_ACCOUNT={your_aws_account_id}
export CDK_DEFAULT_REGION=us-east-1

Bootstrap the buyer account atmosphere. You have to add particular insurance policies to the AWS CDK position on this case.

cdk bootstrap aws://{your_aws_account_id}/{your_aws_region} --cloudformation-execution-policies "arn:aws:iam::aws:coverage/AmazonMSKFullAccess,arn:aws:iam::aws:coverage/AdministratorAccess" –-profile

You now must grant the buyer account entry to the MSK cluster.

On the console, copy the buyer AWS account quantity to your clipboard.
Signal out and signal again in to your producer AWS account.
On the Amazon MSK console, navigate to your MSK cluster.
Select Properties and scroll right down to Safety settings.
Select Edit cluster coverage and add the buyer account root to the Principal part as follows, then save the adjustments:
```
"Principal": {
    "AWS": ["arn:aws:iam:::root", "arn:aws:iam:::root"]
},
```

Create the IAM position that must be connected to the EC2 client occasion:

aws iam create-role --role-name awsblog-dev-app-consumerEc2Role --assume-role-policy-document file://dataFeedMsk/ec2ConsumerPolicy.json --profile

Deploy the buyer account infrastructure, together with the VPC, client EC2 occasion, safety teams, and connectivity to the MSK cluster:
```
cdk deploy --all --app "python app2.py" --profile {your_profile_name}
```

Run the functions and look at the info

Now that we have now the infrastructure up, we will produce a uncooked inventory quotes feed from the producer EC2 occasion to the MSK cluster, enrich it utilizing the Apache Flink software, and devour the enriched feed from the buyer software by way of PrivateLink. For this submit, we use the Flink DataStream Java API for the inventory information feed processing and enrichment. We additionally use Flink aggregations and windowing capabilities to establish insights in a sure time window.

Run the managed Flink software

Full the next steps to run the managed Flink software:

In your producer account, open the Amazon Managed Service for Apache Flink console and navigate to your software.
To run the appliance, select Run, choose Run with newest snapshot, and select Run.
When the appliance adjustments to the Working state, select Open Apache Flink dashboard.

You must see your software below Working Jobs.

BDB-3696-FlinkDashboard

Run the Kafka producer software

Full the next steps to run the Kafka producer software:

On the Amazon EC2 console, find the IP tackle of the producer EC2 occasion named awsblog-dev-app-kafkaProducerEC2Instance.

Connect with the occasion utilizing SSH and run the next instructions:

sudo su
cd atmosphere
supply alpaca-script/bin/activate
python3 ec2-script-live.py AMZN NVDA

You have to begin the script throughout market open hours. This may run the script that creates a connection to the Alpaca API. You must see traces of output displaying that it’s making the connection and subscribing to the given ticker symbols.

View the enriched information feed in OpenSearch Dashboards

Full the next steps to create an index sample to view the enriched information in your OpenSearch dashboard:

To seek out the grasp person title for OpenSearch, open the config.py file and find the worth assigned to the openSearchMasterUsername parameter.
Open Secrets and techniques Supervisor and click on on awsblog-dev-app-openSearchSecrets secret to retrieve the password for OpenSearch.
Navigate to your OpenSearch console and discover the URL to your OpenSearch dashboard by clicking on the area title to your OpenSearch cluster. Click on on the URL and register utilizing your grasp person title and password.
Within the OpenSearch navigation bar on the left, choose Dashboards Administration below the Administration part.
Select Index patterns, then select Create index sample.
Enter amzn* within the Index sample title area to match the AMZN ticker, then select Subsequent step.
Choose timestamp below Time area and select Create index sample.
Select Uncover within the OpenSearch Dashboards navigation pane.
With amzn chosen on the index sample dropdown, choose the fields to view the enriched quotes information.

The indicator area has been added to the uncooked information by Amazon Managed Service for Apache Flink to point whether or not the present value route is impartial, bullish, or bearish.

Run the Kafka client software

To run the buyer software to devour the info feed, you first must get the multi-VPC brokers URL for the MSK cluster within the producer account.

On the Amazon MSK console, navigate to your MSK cluster and select View consumer data.
Copy the worth of the Personal endpoint (multi-VPC).

SSH to your client EC2 occasion and run the next instructions:

sudo su
alias kafka-consumer=/kafka_2.13-3.5.1/bin/kafka-console-consumer.sh
kafka-consumer --bootstrap-server {$MULTI_VPC_BROKER_URL} --topic amznenhanced --from-beginning --consumer.config ./customer_sasl.properties

You must then see traces of output for the enriched information feed like the next:

{"image":"AMZN","shut":194.64,"open":194.58,"low":194.58,"excessive":194.64,"quantity":255.0,"timestamp":"2024-07-11 19:49:00","%change":-0.8784661217630548,"indicator":"Impartial"}
{"image":"AMZN","shut":194.77,"open":194.615,"low":194.59,"excessive":194.78,"quantity":1362.0,"timestamp":"2024-07-11 19:50:00","%change":-0.8122628778040887,"indicator":"Impartial"}
{"image":"AMZN","shut":194.82,"open":194.79,"low":194.77,"excessive":194.82,"quantity":1143.0,"timestamp":"2024-07-11 19:51:00","%change":-0.7868000916660381,"indicator":"Impartial"}

Within the output above, no vital adjustments are occurring to the inventory costs, so the indicator exhibits “Impartial”. The Flink software determines the suitable sentiment primarily based on the inventory value motion.

Extra monetary providers use circumstances

On this submit, we demonstrated methods to construct an answer that enriches a uncooked inventory quotes feed and identifies inventory motion patterns utilizing Amazon MSK and Amazon Managed Service for Apache Flink. Amazon Managed Service for Apache Flink affords numerous options akin to snapshot, checkpointing, and a not too long ago launched Rollback API. These options can help you construct resilient real-time streaming functions.

You possibly can apply this strategy to quite a lot of different use circumstances within the capital markets area. On this part, we focus on different circumstances during which you should use the identical architectural patterns.

Actual-time information visualization

Utilizing real-time feeds to create charts of shares is the most typical use case for real-time market information within the cloud. You possibly can ingest uncooked inventory costs from information suppliers or exchanges into an MSK subject and use Amazon Managed Service for Apache Flink to show the excessive value, low value, and quantity over a time period. This is named aggregates and is the muse for displaying candlestick bar graphs. It’s also possible to use Flink to find out inventory value ranges over time.

BDB-3696-real-time-dv

Inventory implied volatility

Implied volatility (IV) is a measure of the market’s expectation of how a lot a inventory’s value is prone to fluctuate sooner or later. IV is forward-looking and derived from the present market value of an choice. It’s also used to cost new choices contracts and is usually known as the inventory market’s worry gauge as a result of it tends to spike increased throughout market stress or uncertainty. With Amazon Managed Service for Apache Flink, you possibly can devour information from a securities feed that can present present inventory costs and mix this with an choices feed that gives contract values and strike costs to calculate the implied volatility.

Technical indicator engine

Technical indicators are used to research inventory value and quantity habits, present buying and selling indicators, and establish market alternatives, which will help within the decision-making technique of buying and selling. Though implied volatility is a technical indicator, there are a lot of different indicators. There could be easy indicators akin to “Easy Transferring Common” that signify a measure of development in a particular inventory value primarily based on the typical of value over a time period. There are additionally extra complicated indicators akin to Relative Power Index (RSI) that measures the momentum of a inventory’s value motion. RSI is a mathematical system that makes use of the exponential shifting common of upward actions and downward actions.

Market alert engine

Graphs and technical indicators aren’t the one instruments that you should use to make funding choices. Different information sources are essential, akin to ticker image adjustments, inventory splits, dividend funds, and others. Traders additionally act on current information concerning the firm, its rivals, staff, and different potential company-related data. You need to use the compute capability offered by Amazon Managed Service for Apache Flink to ingest, filter, remodel, and correlate the totally different information sources to the inventory costs and create an alert engine that may advocate funding actions primarily based on these alternate information sources. Examples can vary from invoking an motion if dividend costs improve or lower to utilizing generative synthetic intelligence (AI) to summarize a number of correlated information gadgets from totally different sources right into a single alert about an occasion.

Market surveillance

Market surveillance is the monitoring and investigation of unfair or unlawful buying and selling practices within the inventory markets to keep up honest and orderly markets. Each personal firms and authorities businesses conduct market surveillance to uphold guidelines and defend traders.

You need to use Amazon Managed Service for Apache Flink streaming analytics as a strong surveillance instrument. Streaming analytics can detect even refined cases of market manipulation in actual time. By integrating market information feeds with exterior information sources, akin to firm merger bulletins, information feeds, and social media, streaming analytics can shortly establish potential makes an attempt at market manipulation. This permits regulators to be alerted in actual time, enabling them to take immediate motion even earlier than the manipulation can totally unfold.

Markets threat administration

In fast-paced capital markets, end-of-day threat measurement is inadequate. Companies want real-time threat monitoring to remain aggressive. Monetary establishments can use Amazon Managed Service for Apache Flink to compute intraday value-at-risk (VaR) in actual time. By ingesting market information and portfolio adjustments, Amazon Managed Service for Apache Flink gives a low-latency, high-performance resolution for steady VaR calculations.

This permits monetary establishments to proactively handle threat by shortly figuring out and mitigating intraday exposures, slightly than reacting to previous occasions. The power to stream threat analytics empowers corporations to optimize portfolios and keep resilient in risky markets.

Clear up

It’s at all times a very good observe to scrub up all of the sources you created as a part of this submit to keep away from any extra value. To wash up your sources, full the next steps:

Delete the CloudFormation stacks from the buyer account.
Delete the CloudFormation stacks from the supplier account.

Conclusion

On this submit, we confirmed you methods to present a real-time monetary information feed that may be consumed by your prospects utilizing Amazon MSK and Amazon Managed Service for Apache Flink. We used Amazon Managed Service for Apache Flink to counterpoint a uncooked information feed and ship it to Amazon OpenSearch. Utilizing this resolution as a template, you possibly can mixture a number of supply feeds, use Flink to calculate in actual time any technical indicator, show information and volatility, or create an alert engine. You possibly can add worth to your prospects by inserting extra monetary data inside your feed in actual time.

We hope you discovered this submit useful and encourage you to check out this resolution to resolve fascinating monetary trade challenges.

In regards to the Authors

Rana Dutt is a Principal Options Architect at Amazon Net Companies. He has a background in architecting scalable software program platforms for monetary providers, healthcare, and telecom firms, and is enthusiastic about serving to prospects construct on AWS.

Amar Surjit is a Senior Options Architect at Amazon Net Companies (AWS), the place he makes a speciality of information analytics and streaming providers. He advises AWS prospects on architectural finest practices, serving to them design dependable, safe, environment friendly, and cost-effective real-time analytics information techniques. Amar works intently with prospects to create revolutionary cloud-based options that tackle their distinctive enterprise challenges and speed up their transformation journeys.

Diego Soares is a Principal Options Architect at AWS with over 20 years of expertise within the IT trade. He has a background in infrastructure, safety, and networking. Previous to becoming a member of AWS in 2021, Diego labored for Cisco, supporting monetary providers prospects for over 15 years. He works with giant monetary establishments to assist them obtain their enterprise objectives with AWS. Diego is enthusiastic about how expertise solves enterprise challenges and gives useful outcomes by growing complicated resolution architectures.