Beyond MessageGroupID: Scaling SQS FIFO ECS-EC2 Listeners to New Heights

Leveraging In-Flight Message Count to dynamically scale your ECS – EC2 Tasks to bring optimal message processing efficiency alongside cost efficiency.

 

Intro

In AWS SQS You have Standard and FIFO Queue types. Scaling based on the number of messages in SQS is straightforward since we do not have to think about the order of the messages being processed so it guarantees maximum throughput. In simple terms, if you scale the listeners (ECS Tasks in our context) queue messages will process faster. However, in the FIFO queue type, there is a slight difference. The main use case of using a FIFO queue is to ensure your Messages are delivered and processed in the exact order they are sent.

In such a scenario how can you scale your message consumers? 

 

Default Pattern provided by AWS

MessageGroupID is the default solution for such a requirement. Let’s explore more about MessageGroupID

MessageGroupId is the tag that specifies that a message belongs to a specific message group. Messages that belong to the same message group are always processed one by one, in a strict order relative to the message group (however, messages that belong to different message groups might be processed out of order).

 

Key Components of the above diagram 
  • There are 8 messages in the FIFO queue
  • Those 8 messages belongs to three MessageGroupIDs 
    • MessageGroupID 1001 – {M8, M5,}
    • MessageGroupID 2001 – {M7, M3, M2, M1}
    • MessageGroupID 3001 – {M6, M4}
  • There are 4 consumers listening to the FIFO queue

 

How the messages will process with the above setup
  • Message Consumers  01 will start consuming the MessageGroupID 2001 related messages in the FIFO order (Start with M1)
  • Message Consumers  02 will start consuming the MessageGroupID 1001 related messages in the FIFO order (Start with M5)
  • Message Consumers  03 will start consuming the MessageGroupID 3001 related messages in the FIFO order (Start with M4)
  • All the 3 consumers will process messages in parallel 
  • Message Consumer 04 is in a idle state

 

 

PAIN with the Default Pattern 

Let’s talk about a use-case and the issues you might encounter in the previously explained configuration.

Use-case
  • You have 100,000 messages in a FIFO queue 
  • Consumers of the queue are ECS – EC2 tasks 
  • Consumer might take 10 Seconds to process each message 
  • You have configured 30 as the desired task count 
  • In the queue statistics, Messages in flight (not available to other consumers) number shows as 10
Outcome
  • Only 10 consumers are processing messages in parallel
  • 10 unique MessagegroupIDs present in the queue which contain 100,000 messages  
  • 20 Consumers are in idle state (Unused resources) – Low Cost efficiency & High Performance Efficiency    

 

Let’s think on when we had configured the desire task count to 5 

  • Only 5 consumers are processing messages in parallel 
  • 10 unique MessagegroupIDs present in the queue which contain 100,000 messages  
  • Only 5 MessageGroupID related messages are been processed where it can scale up to 10 to get the optimal eiffciency for message processing – High Cost efficiency & Low Performance Efficiency  

 

Challenge: Bring a balance between Cost & Performance, when we process messages in a FIFO with multiple MessageGroupIDs & consumers.

 

ANALYSIS

What other metrics we can used to overcome this challenge? 

In-Flight Message came into the rescue.  

What do you mean by In-Flight Message in SQS

An Amazon SQS message has three basic states:

  1. Sent to a queue by a producer.
  2. Received from the queue by a consumer.
  3. Deleted from the queue.

A message is considered to be stored after it is sent to a queue by a producer, but not yet received from the queue by a consumer (that is, between states 1 and 2). A message is considered to be in flight after it is received from a queue by a consumer, but not yet deleted from the queue (that is, between states 2 and 3). There is a quota to the number of in flight messages. For FIFO queues, there can be a maximum of 20,000 in flight messages (received from a queue by a consumer, but not yet deleted from the queue). 

In lamen terms if we take consumers process one message at a time, then In-Fight messages represent the number of consumers actively processing messages. In my previous example it was 10. 

 

SOLUTION

Utilizing the In-Flight Message Count for Dynamic Scaling 

Simple algorithm came to rescue 

  • X → In-Flight Message Count → ApproximateNumberOfMessagesNotVisible Metric
  • Y → Number of Tasks Listening to the Queue →  Desired task count

 

Scenario 01 – (X<Y) Idle Resources (Cost Efficiency)

Understanding the Scale-Down mechanism implemented herein is straightforward. Primarily, the system evaluates the count of in-flight messages, and if this count consistently remains lower than the number of tasks deployed, a scale-down operation is initiated. The scale-down process is governed by a target tracking policy, which orchestrates a gradual reduction in the number of tasks.

Scenario 02 – (X=Y) Improving the message processing Efficiency 

In the Scale-Up scenario, if we add few more tasks, there might be a chance we can process new messages with different MessageGroupIDs. Hence we introduced spare tasks, spare tasks are idle ECS tasks that are ready to process incoming messages. The specific number of spare tasks required may vary depending on the desired latency, but typically 1 or 2 spare tasks are adequate for efficient scaling.

In our final solution, we introduced set of components to manage this dynamic scaling process.

 

Key Item here is the Calculating the Task Utilization Custom Metric. You can fine tune this calculation as per your requirement.

 

TaskUtilization Percentage

The TaskUtilization metric is computed through a custom process utilizing the ApproximateNumberOfMessagesNotVisible value and the current task count. A Lambda function is specifically developed for this purpose, invoking the CloudWatch Metrics API to retrieve five data points representing the one-minute average values for ApproximateNumberOfMessagesNotVisible of the designated FIFO queue. The maximum average value among these data points is then determined.

Subsequently, the desired_task_count is calculated by adding the maximum average for ApproximateNumberOfMessagesNotVisible to the number of spare tasks. The taskUtilization percentage is then derived by multiplying 100 with the ratio of desired_task_count to the number of currently running tasks. This calculated value is crucial for dynamically scaling the ECS task count.

 

 

 

CONCLUSION

The solution helped us to improve the FIFO processing efficiency by 5X and provide significant cost efficiency as well. You can further bring changes to above solution like, Start and Stop this Scaling Policy only when OldestMessage hit a threshold or based on message count. Also, adjust the max consumer count based on how other services behave (ex: RDS performance if the task is only depend on the RDS).

To warp up, above approach might not be the only solution to overcome this challenge, we are continuously evaluating different approaches to bring more cost and performance efficiency to our FIFO message consumers.

 

 

References

  • https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html

Read More

AWS Essentials Unveiled: Your Starting Point for Engineering Journey Mastery

Are you looking to distinguish yourself from other candidates applying for a Software Engineering position? If that’s the case, this blog will guide you in becoming that standout individual.

Software engineering encompasses a diverse range of areas, necessitating fundamental engineering skills for every individual. Furthermore, in today’s era, possessing practical expertise in cloud infrastructure is an essential asset for engineers aiming to excel in the information technology field. In this blog post, our focus will primarily be on exploring Nine vital AWS services that you should adeptly command, ensuring your prominence in the competitive landscape of the IT industry.

 

AWS Products

How many services and products does AWS offer within its cloud? As of August 2023, the count stands at approximately 239. Can you achieve mastery over all of them? Certainly, but is it a necessity? The answer is mostly no.

Now, let’s delve into the exploration of the Nine essential services you should aim to master in order to flourish on your engineering journey.

 

Identity & Access Management (IAM) IAM

AWS Identity and Access Management (IAM) is a web service that helps you securely control access to AWS resources. With IAM, you can centrally manage permissions that control which AWS resources users can access. 

Key Concepts to Master

  • Understanding IAM and its role in managing user access.
  • Creating users, groups, and roles.
  • Setting permissions and policies.

Video –> AWS IAM Core Concepts You NEED to Know

 

 

Basic Networking Concepts in AWS (VPC) VPC

Amazon Virtual Private Cloud (Amazon VPC), you can launch AWS resources in a logically isolated virtual network that you’ve defined. This virtual network closely resembles a traditional network that you’d operate in your own data center, with the benefits of using the scalable infrastructure of AWS.

Key Concepts to Master

  • Overview of Amazon VPC (Virtual Private Cloud)
    • A VPC is a virtual network that closely resembles a traditional network that you’d operate in your own data center. After you create a VPC, you can add subnets.
    • AWS Regions are physical locations around the world where Amazon clusters data centers for application and service delivery in AWS Availability Zones. 
    • An Availability Zone (AZ) is a grouping of one or more discrete data centers that provide applications and services in an AWS region.
  • Subnets, route tables, and security groups
    • A subnet is a range of IP addresses in your VPC. A subnet must reside in a single Availability Zone. After you add subnets, you can deploy AWS resources in your VPC.
    • Public & Private Subnet
    • Use route tables to determine where network traffic from your subnet or gateway is directed.
    • A security group acts as a firewall that controls the traffic allowed to and from the resources in your virtual private cloud (VPC). You can choose the ports and protocols to allow for inbound traffic and for outbound traffic.

Video –> AWS Networking Fundamentals

 

 

Deploy your services in EC2 EC2

Amazon EC2 provides scalable computing capacity in the AWS cloud. Leveraging it enables organizations to develop and deploy applications faster.

Key Concepts to Master

  • Launching, configuring, and connecting to EC2 instances.
    • A security group acts as a virtual firewall for your EC2 instances to control incoming and outgoing traffic. 
    • Inbound rules control the incoming traffic to your instance.
    • Outbound rules control the outgoing traffic from your instance. 
  • Overview of instance types, storage options
    • Instance type that you specify determines the hardware of the host computer used for your instance. Each instance type offers different compute, memory, and storage capabilities, and is grouped in an instance family based on these capabilities. 
    • Data Storage for the EC2 instances ranging from EBS and EFS
  • Elastic Loadbalancer 
    • Elastic Load Balancing automatically distributes your incoming traffic across multiple targets, such as EC2 instances, containers, and IP addresses, in one or more Availability Zones. 

Video –> Launch an AWS EC2 Instance Tutorial

 

 

Container Orchestration via Amazon ECS ECS

Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service that simplifies your deployment, management, and scaling of containerized applications. This service helps you to run your applications or services in docker containers.  

Key Concepts to Master

  • Amazon Elastic Container Registry (Amazon ECR) is a fully managed container registry offering high-performance hosting, so you can reliably deploy application images and artifacts anywhere.
  • Cluster – Logical group of container instances
  • Container Instance – EC2 instance in which ECS agents runs and is registered to cluster.
  • Task Definition – Description of application to be deployed
  • Task – An instantiation of task definition running on container  instance
  • Service – Runs and maintains predefined tasks simultaneously
  • Container – Docker Container created during task instantiation
  • ECS launch type – EC2 and Fargate 

Video –> Deploy an Application to Amazon ECS With EC2 | Docker | ECR | Fargate | Load balancer

 

 

Serverless computing with AWS Lambda Lambda

AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers

Key Concepts to Master

  • Writing and deploying serverless functions
    • Serverless framework is a very good tool to deploy serverless applications 
  • Event triggers and integration with other AWS services.

Video –> How to Deploy AWS Lambda using Serverless Framework

 

Object Storage S3

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Most of the time S3 is been used to store static content of you web application frontend. 

Key Concepts to Master

  • Understanding object storage and the basics of S3 buckets.
    • Amazon S3 to host a static website. On a static website, individual webpages include static content. They might also contain client-side scripts.
    • Deeply couple with AWS Cloudfront for content delivery
  • Uploading, downloading, and managing objects in S3.
    • Object versioning 
  • Overview of data durability, availability, and access control.
    • Public Access 
    • Private Bucket & Bucket Policy 

Video –> Secure Static Hosting with S3, CloudFront and OAI  

 

 

Database or AWS fully managed Relational Database (RDS) RDS

Amazon Relational Database Service (Amazon RDS) is a collection of managed services that makes it simple to set up, operate, and scale databases in the cloud. Choose from seven popular engines — Amazon Aurora with MySQL compatibility, Amazon Aurora with PostgreSQL compatibility, MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server.  

Key Concepts to Master

  • Creating and configuring RDS instances.
    • Select the Database engine and Size of the instance 
  • Connecting applications to RDS databases.
    • Security group configuration and port configuration 
    • RDS Cluster (Reader & Writer Instances and Endpoints)

Video –> AWS RDS MySQL Database Setup | Step by Step Tutorial

 

 

Monitor & Access Your Application Logs via CloudWatch Cloudwatch

Amazon CloudWatch collects and visualizes real-time logs, metrics, and event data in automated dashboards to streamline your infrastructure and application maintenance. Must have tool to troubleshoot and debug your application issues. 

Key Concepts to Master

  • Application Logs
    • Log Groups which collect the service logs 
    • Log Insights to query logs 
  • Setting up alarms and notifications.
    • Setup infrastructure alarms 
    • Create monitoring dashboards 
    • Define rules to trigger events (Lambda functions)
  • Using AWS CloudTrail for auditing and tracking API activity.
    • To monitor AWS service level access

Video –> AWS Cloudwatch Service Overview | Console Walkthrough

 

 

By delving deeper into the services and their key features mentioned above, you can acquire the necessary skills to advance in your software engineering journey. Given below is a sample architecture diagram where you have the services we mentioned and some more additional ones.

Architecture

 

Below, you’ll find valuable links to help you study these services extensively. I hope these resources assist you in mastering AWS services and transforming into an AWS Ninja within your organization.

  1. AWS Networking Fundamentals
  2. AWS IAM Core Concepts You NEED to Know
  3. Secure Static Hosting with S3, CloudFront and OAI  
  4. Launch an AWS EC2 Instance Tutorial
  5. Deploy an Application to Amazon ECS With EC2 | Docker | ECR | Fargate | Load balance
  6. How to Deploy AWS Lambda using Serverless Framework
  7. AWS RDS MySQL Database Setup | Step by Step Tutorial
  8. AWS Cloudwatch Service Overview | Console Walkthrough
  9. Intro to AWS – The Most Important Services To Learn

Read More

Cost-Effective Scaling: Optimizing RDS Expenses While Simultaneously Facilitating Business Growth

This marks the inaugural article in a series titled ‘Initiate your SaaS journey by taking small steps and steadily enhancing capabilities with AWS‘,  The series delves into the challenges faced during the early years of my SaaS industry journey (Velaris.io) and how these obstacles were overcome through a gradual improvement approach with AWS capabilities and Practical Engineering.

In this article, we will explore a specific use case where we encountered a substantial 140% increase in RDS costs compared to the previous month’s bill in AWS. We will delve into the design changes we implemented in our architecture to effectively resolve this cost surge.

 

RDS Costs Surged by 140% Between November and December

As depicted in the chart, there was a notable 1.5X rise in RDS cost between November 2022 (3300 USD) and December 2022 (8000 USD). This unexpected increase prompted us to conduct a thorough analysis to understand the factors contributing to this cost surge.

 

Analysing the Problem

Initially, we began our investigation by reevaluating our existing application architecture and its associated data load. Concurrently, we utilised the AWS Cost Explorer tool to gain a more detailed breakdown of the RDS cost.

The diagram illustrates that we were employing a single Database (DB) to manage both our Web application data load and Reporting ETL data load. This meant that a single RDS Aurora Postgres instance handled all data storage and processing load. Additionally, in December, three new customers joined our platform, and their data volumes were significantly larger compared to the four existing customers.

The Cost Explorer analysis revealed a direct correlation between the increased data load from the new customers and the rising I/O requests in the Aurora Postgres database, which served as the primary cost driver. The diagram below depicts the substantial surge in I/O operations during this period. Our final conclusion was that the influx of data from the new customers led to extensive processing time and necessitated a large number of I/O operations in DB.

In light of these findings, we promptly recognised the need to make architectural changes to address scalability issues and optimise costs effectively.

 

Realising the Solution

The optimal resolution entails a twofold approach: reducing RDS costs and enhancing application architecture scalability. To achieve cost reduction, we must transition to a different RDS type with a more suitable costing strategy than Aurora Postgres, aligning precisely with our requirements. Moreover, to bolster scalability, it is imperative to decouple the reporting data processing, or the ETL process, from the Web app DB. By implementing these strategic changes, we can attain a more cost-effective and scalable solution for our application.

Upon careful analysis of RDS costing, it became evident that Postgres RDS emerged as the superior choice for handling heavy data processing compared to Aurora Postgres in our use case. The following cost-related factors elucidate why it was the more favorable option:

  • Aurora Postgres: 0.20$ per 1 million I/O requests
  • RDS Postgres: $0.116 per IOPS-month of provisioned io1 IOPS (input/output operations per second)

Another key factor is that, we can configured how much IOPS we need to provision in our Postgres RDS instance. Let’s take this example I took from AWS EBS Cost;

For example, let’s say that you provision a 2,000 GB volume for 12 hours (43,200 seconds) in a 30-day month. In a region that charges $0.125 per GB-month, you would be charged $4.167 for the volume ($0.125 per GB-month * 2,000 GB * 43,200 seconds / (86,400 seconds/day * 30-day month)).

Additionally, you provision 1,000 IOPS for your volume. In a region that charges $0.065 per provisioned IOPS-month, you would be charged $1.083 for the IOPS that you provisioned ($0.065 per provisioned IOPS-month * 1,000 IOPS provisioned * 43,200 seconds /(86,400 seconds /day * 30-day month)).

For this example, the charges would be:
$5.25 ($4.167 + $1.083).

 

Replication via AWS Data Migration Service (DMS)

Upon settling on the decision to utilise RDS Postgres as the reporting database, the subsequent challenge arose: devising a way to establish data replication between the Aurora DB and the Reporting RDS DB. Given that these two were distinct DB instance types and necessitated Master – Master data replication, we embarked on evaluating several AWS services.

After careful consideration, we opted for the AWS Data Migration Service (DMS) to facilitate the one-way data replication from the Web App DB to the Reporting DB. The appeal of DMS lies in its fully managed nature by AWS, which translates to zero operational costs. Consequently, we proceeded with DMS and, to date, have encountered no issues with the replication process.

Finally, by combining all the aforementioned components, we successfully realised our desired architecture, as illustrated in the image below.

 

Final Outcome and Learnings

Our primary objective was to decrease the RDS cost. By February, following the aforementioned modifications, we achieved a remarkable 140% reduction in RDS expenses. The diagram below clearly demonstrates the significant decrease in RDS costs.

Furthermore, this architectural alteration not only led to substantial cost reduction but also significantly enhanced the stability of our application. The full decoupling of the Web App DB from the Reporting Data load ensured that user interactions with the application would not interfere with ETL data load processing, resulting in improved overall application stability.

An essential lesson learned from this experience is the importance of continuously reassessing the architecture during periods of business growth. Making precise design choices promptly and adeptly in response to rapid changes is imperative to achieving and sustaining successful outcomes.

Through this article, I aimed to share my experiences on effectively optimising RDS costs while supporting business growth. I intend to build on this knowledge by presenting more use cases in the forthcoming months as part of my article series, ‘Initiate your SaaS journey by taking small steps and steadily enhancing capabilities with AWS‘. I hope these future articles will provide valuable insights to readers, guiding them in their own SaaS journey and AWS implementation.

 

Read More

Data Warehousing For a Beginners

Basic Introduction

Data Warehousing (DW) is process for collecting and managing data from varied sources to provide meaningful business insights. A Data warehouse is typically used to connect and analyze business data from heterogeneous sources. The data warehouse is the core of the BI system which is built for data analysis and reporting. You want a data warehouse to analyze petabytes of historical data that you’ve ingested from your systems, and for the queries to run in minutes.

Staging Layer (Focus on the “E”)

  • Mirror images of the source objects (Get data from source ASAP)

    • Non-Persistence Staging layer : Load and delete after moving to User access layer

    • Persistence Staging layer : Contain the history of data. New and updates are accommodated accordingly.

  • Prefer to have Persistence Staging Layer – Exact data as source data / Will need more storage / Archive to S3 after few years

User Access Layer

  • Dimensional data : Structured data as per the requirement of the frontend applications / reports

ETL (Extract, Transform, Load)

  • Initial ETL

    • One time ETL

    • Before go live get all the data from the source

    • Will bring in

      • Data needed for BI and analytics

      • Historical data

  • Incremental ETL

    • Data that refresh

      • New data

      • Modifications of data (updates, soft deletes)

 

Incremental ETL Patterns (Near Real Time, Hourly, Daily, Weekly)

  • Append : Appending new information

  • In-place update : Doing updates in existing rows

  • Complete replacement : Delete all existing data add the new data set

  • Rolling append : Wipe out old data set and add the latest (only have 36 months of data in DW all the time)

Data Transformation

  • Uniform the data : Getting data from different sources will have different representations. We need to unify it.

    • Data values

    • Data types & size

    • De-duplication : remove duplication data (mainly for master data)

    • Dropping columns : remove unwanted columns from the source when we move to DW

    • Value based row filtering : remove unwanted rows based on the values

    • Correcting known errors : data issues to be fixed when moving data to DW

  • Restructure the data

    • Design the data structure

Read More

Cloud SaaS Security Patterns & How AWS Services Can Address Them

Top-Level Cloud Security Requirements

  • R1: Must provide protection to system’s components. This requirement concerns the protection of system’s components both the software (e.g., piece of code) and hardware (e.g., sensor devices) that are parts of system.

  • R2: Must be able to prevent unauthorized access and intrusion to system and resources. This requirement is about assuring that only genius user or application can access to application or system’s resources.

  • R3: Must be able to monitor network requests. The main goal is to monitor network requests in order to prevent potential attacks to system and its resources.

  • R4: Must have auditing option and be able to recover from a breach. This requirement concerns the auditing of system and resources usage to find out the anomaly.

  • R5: Must ensure data protection at rest and in transit. This requirement concentrates on how to protect data both in transit and at rest, especially when they are in public Cloud platform.

  • R6: Must ensure privacy protection and regulatory compliance. This requirement is about how to ensure privacy protection and regulatory compliance of data processed in the Cloud infrastructure.

  • R7: Must provide secure communication between modules. A system may be made of different modules deployed in the same or different Cloud platforms. Thus, it is important to ensure a secure communication between those modules.

  • R8: Must provide protection to system’s resources. The system’s resources here refer to the Cloud resources required to run Cloud application. How to protect Cloud’s resources from excessive and unnecessary use in order to ensure economic durability and durable availability of application running on the Cloud platform.

 

AWS Services To Rescue

Category

Pattern

Description

Required

Approach

Compliance and Regulatory

Data Citizenship

How can a Cloud-based solution achieve regulatory compliance with respect to data storage locality.

Yes

AWS Tags (Location Tags for the resources)

Cryptographic Erasure

How can a dataset be reliably and securely erased after it was stored in the Cloud. If we replicate the data in multiple regions, then this needs to be addressed

No

AWS KMS (Ensure data is encrypted in rest and KMS manage the key for it)

Shared Responsibility Model

How can a Cloud services consumer effectively manage their Cloud application legal and regulatory compliance

Yes

Usage of AWS managed services

Compliant Data Transfer

How can data be transferred for processing to other parties in potentially different jurisdictions while staying in compliance with legal and regulatory requirements

Yes

AWS Tags (Location Tags for the resources)

When we use third-party functionalities which is often exposed through an APIs, we need to adhere to data transfer guidelines

Data Retention

How long is personal information retained

Yes

Lambda function to automate the data clearing process

Data Lifecycle

How to efficiently and securely manage data lifecycle in the Cloud

Yes

AWS Data Lifecycle Manager

Intentional Data Remanence

How can data in the Cloud be protected from accidental or malicious deletion

Yes

RDS data replication/redundancy

Identification, Authentication and Authorization

Multi-Factor Authentication

How to simply, yet securely authenticate physical users of Cloud-based applications

Yes

AWS Cognito with MAF

Federation (Single Sign-On)

How to authenticate with customer provided user identities

Yes

AWS Cognito with AWS SSO

Access Token

How to control human or machine user access to Cloud APIs

Yes

AWS Security Token Service with Cognito

Mutual Authentication

How to establish identity of parties in a Cloud communication channel. Without proper authentication between communicating parties, man-in-the-middle attacks are possible

Yes

AWS Client VPN, AWS TLS/SSL certificates via Certificate Manager

Secure User Onboarding

How to securely perform initial registration of Cloud application users

Yes

Define a secure onboarding process / AWS Customer on boarding process

Identity and Access Manager

How to securely and effectively manage a user database and provide authentication and authorization functionality in a Cloud application

Yes

AWS IAM & Cognito

Per-request Authentication

How to continuously prove the identity of the user when they perform sensitive operations

Yes

Cloud Watch with events and notifications. Tools monitoring the usage activities of user from the start till the end of usage session. JWT token validation throughout the request life cycle and log user activities. Detect any abnormal activities via log analysis

Access Control Clearance

How to enforce access and usage control policies for different types of authentication

Yes

Implement a central Authorization module and validate the access in FE and BE. (Role base access)

Secure Development, Operation and Administration

Bastion Server

How to access Cloud resources without exposing them directly to the Internet

Yes

Bastion Host outside the Firewall

Automated Threat Detection

How to detect network attacks on Cloud internet endpoints

Yes

AWS Guard Duty

Economic Durability

How to establish and maintain availability of the Cloud services in the face of distributed denial-of-service attacks

Yes

AWS WAF & Cloud Watch

Vulnerability Management

How to detect and respond to found vulnerabilities

Yes

Use external tools

Privacy and Confidentiality

End-to-End Security

How to communicate a message between two parties so that its confidentiality is protected across all components in the Cloud communication channel

Yes

AWS KMS and Certificate Manager (security guarantees are needed for data in transit and at rest)

Computation on Encrypted Data

How to outsource data for computation to a Cloud service without disclosing it in the process

No

Cloud provider maintains the keys, we need to fully trust the cloud provider

Data Anonymization

How to remove personal identifiers from datasets to protect privacy, while keeping the datasets still valuable for processing

Yes

AWS Athena, Cloud Watch and Lambda to automate a scan

Processing Purpose Control

How to ensure data is used or processed in accordance with its original intended purpose

No

Automated tool to trace and audit the usage of it.

Secure Architecture

Virtual Network

How to connect components of a Cloud application architecture without unnecessarily exposing them to the Internet

Yes

AWS VPC

Web Application Firewall

How to protect web API endpoints from unauthorized access and abuse

Yes

AWS Firewall

Secure Element

How to securely provide and strongly protect identity of IoT devices or external services

Yes

Using a unique identity, PKI should be the foundation of any IoT security strategy / external service

Secure Cold Storage

How to protect the availability of large amounts of data securely and cost-effectively

Yes

AWS Glacier for Cold storage with encryption

Certificate and Key Manager

How to securely and effectively create, provision and revoke certificates and keys for securing data at rest and in transit

Yes

AWS KMS and Certificate Manager

Hardware Security Module

How to best protect the cryptographic secrets owned by Cloud tenants while still enabling Cloud processing infrastructure to compute on the tenant data

No

CloudHSM

Secure Auditing

How to record and report security-related behavior in an operating Cloud system

Yes

AWS security audit check list

https://aws.amazon.com/blogs/security/auditing-security-checklist-for-aws-now-available/

 

Read More

Let’s start the 2021 New Year with SOLID Principles

Firstly, let’s figure out what do you meant by SOLID Principles. SOLID principles are set of five principles that can be used in Object-Oriented programming. It’s a set of best practices / design patterns to follow when you are designing a class/object structure.

One famous quote in software engineering is “It is not enough for code to work” by Robert C Martin, Clean Code. Keeping that in mind let’s evaluate the SOILD principles and how it will help software development in the long run.

When and why we need to apply SOLID?

SOLID principles will help us to lay the foundation on building clean and maintainable architectures.

When you don’t design your application with best practices, you will end up with many issues sooner than later. For examples

  • High Code Fragility: Fragility is the tendency of the software to break in many places when every time its changed
  • High Code Rigidity: Rigidity is the tendency for software to be difficult to change, even in simple ways

Having fragility and rigidity in your project means having symptoms of tech debt, which is the silent killer of all software projects. In most of the software development process, with the business priority and push towards for quick delivery, engineers tend to prioritize fast delivery over code quality which will tend to introduce technical debt.

Applying SOLID principles help us to control the technical debt.  What are those SOLID Principles?

  1. SRP – Single Responsibility Principle
  2. OCP – Open Close Principle
  3. LSP – Liskov Substitution Principle
  4. ISP – Interface Segregation Principle
  5. DIP – Dependency Inversion Principle

Let’s take each principle one by one and understand it in very high level so we can apply them in our projects.

Single Responsibility Principle (SRP)

A Class should do only one thing, and therefore it should have only a single reason to change. Some of the common responsibilities in our code are, Business Logic, User Interface, Persistence, Logging and Orchestration.

A common anti-pattern is that, In an Invoice class having the implementation of printInvoice and saveToFile like functions. Ideally those two should be in different classes. Those classes might be InvoicePrinter and InvoicePersistence and those classes will have a single responsibility.

Also avoid common Utils class for having all helper functions. Have specialized classes where you can have them.

 

Open Close Principle (OCP)

OCP means, Classes should be open for extension and closed to modification. Closed for modification means that each new feature should not modify existing source code. Open for extensions means a component should be extendable to make it behave in new ways.

There are two OCP implementation strategies. One is via Inheritance and other via Strategy pattern. The common usage is to use the strategy pattern to extend the behaviors. For example, having an interface called InvoicePersistence and implementing any number of different classes (DB persistence, File Persistence) based on the main interface. This ensure Invoice Persistence is extendable in many ways.

 

Liskov Substitution Principle (LSP)

What is Liskov principle? In simple terms, “Any object of a type must be substitutable by objects of a derived typed without altering the existing functionality of that program”. Rather than thinking about the Is-a relationship, just think on Is-substitutable-by in object relationships of your application context and design the objects hierarchy is the key takeaway of LSP. We will observe violations of LSP, when there are partial implemented interfaces in classes (functions which are throwing not implemented exception).  Therefore, avoid empty implementations and have proper class hierarchy to avoid such scenarios.

 

Interface Segregation Principle (ISP)

The key point to note here is that, the term interface is not referring to Java Interface. It can be an interface or abstract class in your application. By definition ISP defined as “Clients should not be forced to depend on methods that they do not use”. Therefore, having small interfaces with a single focus will comply the ISP. By doing so, it will reinforce the LSP and SRP. Key advantage of lean interface is that, minimize dependencies on unused members and reduce code coupling. How to identify that you have “Fat” interface? it’s not much difficult, if you see an interface with large number of method definitions is a symptom for Fat interface. Analyze the interfaces and break it down is the way forward. If you are dealing with legacy code and wanted to refactor the code to adhere the ISP, then you can use the “Adaptor” design pattern to fix the existing issues.

 

Dependency Inversion Principle (DIP)

This principle mainly captures key two concepts, Dependency Injection (DI) and Inversion of Control (IoC) and how those should be used in our application. Those two concepts are well known concepts, following reference link will explain them in detail. (DI, IoC) Java framework like Spring framework have those two concepts in-build.

This conclude the five principles under SOLID principles. You can find good code samples under the reference section for each principle as well. In addition to SOLID, there are few other ways to keep your architecture clean, for example: constant refactoring, application of design patterns, TDD / Unit testing. When you apply SOLID Principles, you will also get the following as short and long-term benefits.

  • Code will be easy to understand and reason about
  • Changes are faster and have minimum risk level
  • Highly maintainable in the long run
  • Cost effective

Decision on applying them in correct scenario is totally up to you!

 

Reference

https://www.baeldung.com/solid-principles

https://www.digitalocean.com/community/conceptual_articles/s-o-l-i-d-the-first-five-principles-of-object-oriented-design#:~:text=SOLID%20is%20an%20acronym%20for,OOD)%20principles%20by%20Robert%20C.&text=O%20%2D%20Open%2Dclosed%20Principle,I%20%2D%20Interface%20Segregation%20Principle

https://app.pluralsight.com/library/courses/solid-software-design-principles-java/table-of-contents

https://refactoring.guru/design-patterns/adapter

https://dzone.com/articles/ioc-vs-di

Read More

Leveraging Application Load Balancer (ALB) to Mock APIs for Test Automation with Zero Application Code Changes

“Mock Test APIs In a Distributed Application Environment with AWS Serverless Stack and Zero Application Code Changes”    

 

In an enterprise world, your application might be depended on many other sub systems. Most of the time you have zero ownership on those. Actual issue will come to surface when your application depended on data from those sub systems. Data might be refreshed most of the time, so you can’t have fixed set of data for your test automation work. It will always be a bottleneck for your QA team as well. How can you overcome this situation?

As developers, there are might be many options. One quick solution that might come in to our mind is to have set of pre-defined APIs. If client is requesting for that URL pattern, then from our backend system we will intercept the request and will send the mock data set to client. In the above approach, you need to do code changes in the backend system. And you need to have a mechanism to exclude such interceptors or conditions been push to production. It will be an overhead all the time.

Thought process was carried out to implement a solution which will have zero code changes to our application code and zero operation and maintenance cost. Since our application is deployed in AWS, we thought of using few serverless services to address the above requirement. Let’s drive through it step by step.

 

Using AWS Application Load Balancer (ALB)

Usual deployment pattern in an enterprise web application is that, you will have a Load balancer on top of the application servers. In AWS, you have a component called Application Load Balancer (ALB), where you can do many actions to your client requests before reaching the application servers. Actually, we were using the AWS classic load balancer, but with the feature set of ALB we moved to ALB and used some of the features to our implementation.

 

Mock Small Size Response

 

In ALB you have Listeners and for each listener you can configure rules. In the Listener rules, you can configure a fixed response.

Example: You need to mock the order history data of a certain account. So, if user submit a request URL as below, you need to submit a fixed data set all the time. To configure it, just use the Fixed response listener rule.

IF URL path is /PMT/api/fetchOrderGuide/067-123456, (You can add many other rule conditions to filter your request)

Then Return a Fixed Response (Add your fixed response body)

It’s simple as that. But there is a limitation on max length of the response body, which is 1024 characters. Hence if you have a large response body, then you have to move to the next option.

 

Medium Size Response

 

If you have a medium size response less than ~10,000 characters, then integration of ALB listener rules and API Gateway will help your course. Here how you achieve it,

There is a Rule action called Redirect which will Redirect requests from one URL to another. When you have the redirect capability, then you need a component that can produce the response without much effort. Since API Gateway is fully managed by AWS an it has a feature to generate Mock APIs, you can choose API Gateway as the response producer.

In API Gateway (APIG) create a Mock API. For testing purpose you can use an existing APIG and do the following steps or follow this link to go from starch.

  1. Create a resource in the APIG (ex: /fetchOrderguide)
  2. Attach a HTTP Get method to the resource while doing that select Integration type as Mock
  3. In the GET method execution pane, click on the Integration Response container
  4. Expand the HTTP 200 row and click on the Mapping Templates
  5. Click on application/json in the Content Type section
  6. Add the Mock Response in the Text Area and Save
  7. Deploy the new resource in the Stage you wish and note down the endpoint URL for the newly created GET method

Now the Mock API which is exposed via API Gateway is ready to use. You can test the API through the API Gateway console if needed. By default, API gateway and the APIs are public. But if you need to adhere to your organization security compliance, you might need to make the API Gateway and APIs private. You can follow this AWS resource to make your API Gateway private.

Final step for Medium size responses is to link the Mock API and ALB. As per the below diagram, you can add the filtering rules and then use the Redirect Action. In that section, give the API URL information and keep other options as per the diagram.

Now all set to test your API via postman. If your criteria match with the listener rules, then ALB will redirect the request to API Gateway and utilize the Mock API to generate the response and provided it back to the client.

Note: If you need to test this with your frontend (browser), then you need to start browser without CORS. Follow this link to Start chrome without CORS.

 

Large Size Response

 

There is a response limit for API Gateway Mock API responses as well. If you try to add a very large response, then you will get this warning.

“The resource being saved is too large. Consider reducing the number of modeled parameters, the number of response mappings, or perhaps the size of your VTL templates if used.”

Therefore, you need to think on an alternative. The quickest solution would be use a lambda function to generate the large response. Lambda functions are serverless and will have very less cost.

  1. Create the lambda function using the AWS console
  2. Give a function name
  3. Use Author from scratch and use NodeJs 12.x as the runtime. (Feel free to use any)
  4. Choose a role to execution
  5. In the Index.js file

exports.handler = function(event, context) {

  context.done(null, <Add the large response>);

}

  1. Save and Test the function to check whether you receive the response as expected

Now your large response is ready. Next you need to integrate it with API Gateway.

  1. Create a resource in the APIG (ex: /fetchOrderguideGroupView)
  2. Attach a HTTP Get method to the resource while doing that select Integration type as Lambda Function
  3. Provide the ARN of the newly created lambda in the Lambda function text box
  4. Click Save
  5. Test the API to see you get the response from the Lambda function

 

All good now. Repeat the same steps that we followed in final step of linking Medium size response of the Mock API with ALB using the Redirect rule action. Test with postman to see your integration works as expected.

There might be many other alternatives, but this helped our requirement on Mocking APIs for test automation in a very fast and serverless manner with zero code changes to your application.

May we all be well, happy and peaceful, May no harm come to you!

Read More

Making Your Enterprise Application 100% Serverless with AWS

There was an era in which we all fussed about cloud computing; however, right now the hype is mainly about serverless computing. In this article, I will brief you about serverless computing and share my experience in working with some serverless technologies that my team and I used to develop enterprise solutions.

My list of topics are as follows – each will have a quick introduction the technology used along with some web links which we looked at when integrating these into our final solution.

  • Serverless Computing
  • Requirement
  • Architecture & AWS Services
  • Lambda Functions for Microservices and BFF
  • API Gateway
  • Cognito for User Federation
  • ECS Fargate for Long Running Tasks
  • AWS Code Pipeline & Code Build for CI/CD
  • Other Services

[Please visit this link to get details on each of the above topics]

Read More

Client Side Load Balancing Vs Server Side Load Balancing: How Client Side Load Balancing works?

Firstly let’s see what is load balancing;

It’s mainly distributing work of a computer to two or more similar computers. This ensures reliability and increases the responsiveness of the system.

 

 

Load balancer generally group in to two categories.

Layer 4: Act in network and transport layer protocols (IP, TCP, FTP, and UDP)

Layer 7: Distribute requests based upon data found in application layer protocols such as HTTP

 

Also Load Balancer (LB) reside in two types, (more information)

Hardware LB: F5 BIG-IP, Cisco, Citrix

Software LB: NGINX, HAProxy, LoadMaster

 

All these different load balancers uses different algorithms to distribute the load among the application pool. Some of the industry standard Load Balancing algorithms are:

  • Round robin: This method continuously rotates a list of services that are attached to it. When the virtual server receives a request, it assigns the connection to the first service in the list, and then moves that service to the bottom of the list.
  • Least connections: The default method, when a virtual server is configured to use the least connection, it selects the service with the fewest active connections.
  • Least response time: This method selects the service with the fewest active connections and the lowest average response time.

Now that’s all about the very basic theory in Load balancing. Now let’s see how Server side load balancing works in the real world.

 

Server Side Load Balancing

A Server side load balancer sits between the client and the server farm accepting incoming network and application traffic and distributing the traffic across multiple backend servers using various methods. Mostly load balancer will check the health of the server pool underneath and use the algorithm which we discussed earlier to distribute the load.  This was the most common mechanism we used in the past to manage our application load. However the upswing of Client Side Load Balancing is now on the peak. Let’s dig deep on that context.

 

Client Side Load Balancing

As we discussed, in the server side load balancing, a middle component is responsible for the distributing the client requests to the server. However that middle component is moving out on the decision making in the load distribution. Client itself will decide on the server it need to forward the request. How it work is very simple: Client holds the list of server IPs that It can deliver the requests. Client select an IP from the list randomly and forward the request to the server.

With the microservice architecture, Client side load balancing plays a major role. Services like Netflix Ribbon and Eureka components helps client side load balancing have similar features to server side load balancing like fault tolerance, caching and batching.

Let’s see how Eureka and Ribbon work together to achieve the client side load balancing in microservice architecture.

 

As per the above diagram, let’s assume Microservice B wants to communicate with Microservice C. So Microservice B is the client and now it will use the Eureka client and get what nodes (server list) are available in Microservice C. Then using ribbon client in the Microservice B, it will call the Microservice C using default round robin algorithm. So the method which Microservice B used to call Microservice C is known as client side load balancing. Which illustrates that client is the one decide on the server which need to call not a middle component like in Server side load balancing.

 

All the links are provided as references to get more inside on the topic we have covered. Especially how you can use Netflix OSS to achieve client side load balancing in microservice solutions.

 

References

 

Read More

Twelve factors to consider when developing cloud native applications (SaaS)

This is a basic introduction to cloud native application development. First we will get a basic idea about the cloud concept and then move in to the 12-Factors.

 

What is cloud computing environment?

It’s a dynamic environment, which have the capability to allocate resources and release of resources from a virtualized, shared pool. This elastic environment enables more flexible scaling options for a high demand applications.

 

What is cloud native application?

Applications or processes which are run in software containers as isolated units. Applications or services (microservices) are loosely coupled with explicitly described dependencies. There are set of best practices which you need to follow when you’re planning moving to cloud with your application. Those are known as the 12-factors. Let’s investigate on those 12-factors.

 

 

Twelve factors

The 12-factor application methodology was drafted by developers at Heroku. The factors represent a set of guidelines or best practices for portable, resilient applications that will thrive in cloud environments (specifically software as a service applications). Let’s move in to each factor and get a high level idea on how we can achieve / implement this in our application.

  1. There should be a one-to-one association between a versioned codebase: Main idea is not to have different codebases for your application versions. You can have branches for different versions as a solution to avoid repository complexity.

 

  1. Services should explicitly declare all dependencies, and should not rely on the presence of system-level tools or libraries: Main recommendation here is that avoid in pre-installed software in the system level which has a dependency with your application. As a solution we should try our level best to put application dependencies in our application manifest. Tools like Apache Maven can be used to maintain these dependencies in your application.

 

  1. Configuration that varies between deployment environments should be stored in the environment: In this scenario the recommendation is to avoid having environment (Development, Staging and Production) specific configuration with in your application code. If you have any environment specific configurations those need to be stored in the environment not in the application. To have the configuration files in a centralized location we can use the spring config server

 

  1. All backing services are treated as attached resources, which are managed (attached and detached) by the execution environment: Firstly backing service is any service that your application integrate with to perform its normal operations. So, examples for backing service would be database, web services, SMTP server or ftp server. Main idea of this practice is to treat those backing services as your own service.

 

  1. The delivery pipeline should have strictly separate stages: Build, release, and run: When we consider the three stages,
    • Build: Takes the source code and bundle to a package which referred as the build
    • Release: Combine the build and the config and create a release for deploy in an environment. Each release will have a unique identifier and related to a release management tool. It will ensure a quick rollback time.
    • Run: Referred as the runtime. Execute the application in the corresponding environment

 

  1. Applications should be deployed as one or more stateless processes: Specifically, transient processes must be stateless and share nothing. Persisted data should be stored in an appropriate backing service.

 

  1. Self-contained services should make themselves available to other services by listening on a specified port: This means each application are self-contained and expose access over a HTTP port that is bound to it. Spring boot framework is a good example for this, which is having a inbuilt server where you can configure the HTTP port easily

 

  1. Concurrency is achieved by scaling individual processes (horizontal scaling): Idea behind this is having multiple processes with distributed load. The application should be able to scale horizontally and handle requests load-balanced to multiple identical running nodes of the application. In addition application should be able to scale out processes or threads for parallel execution of work in an on-demand basis. This feature comes automatically with the JVM with multi-threading.

 

  1. Processes must be disposable: Fast startup and graceful shutdown. If we elaborate more our application should minimize the startup time like using backing services rather than using in-memory caching. Also in shutdown process, when we stop the application it should not accept new work and let existing work to finish. We can use a queue to push the work and then shutdown if required.

 

  1. All environments, from local development to production, should be as similar as possible: Development environment to production environment should be identical. It will ensure unexpected behavior in the application due to environment inconsistencies. Containerized environment like Dockers is a good solution for this.

 

  1. Applications should produce logs as event streams (for example, writing to stdout and stderr), and trust the execution environment to aggregate streams: Application should not attempt to write to or manage logfiles. This stream output should mainly managed by the execution environment. This stream can be shipped to a log indexing systems such as Splunk, ELK stack which facilitate as a centralized logging system.

 

  1. Run admin/management tasks as one-off process: If admin tasks are needed, they should be kept in source control and packaged alongside the application to ensure that it is run with the same environment as the application

References

Read More