DevOps Automation worth more than a million!

It is just an exaggeration when I say the title interpretation is just leading one’s mind to! but the technical work I am blogging here is no misnomer.

In our enterprise Solution R&D division I was assigned a task to automate “execution of a workload” on different SUT (System Under Test) that has 4 metrics and it takes nearly 2-3 days to complete their execution. Seven team members scheduling these workloads on 15+ different SUTs almost every week with man-handling each metric one after another and in between weekends would add up to the more delays in getting the end results, the SLAs? This exercise continues almost every 3 months and 3 times a year costing a high price when a client is anvil to buy your servers worth millions (for their on-prem/colo/hybrid cloud environments) but not ready to wait for that long with requirements which are very competitively taken care by sometimes initial RFI (Request for information) and other times negotiating over RFPs (Request for proposal) by other vendors too!!

So, I developed a Jenkins framework which executes these workload metrics scheduled on different SUTs as and when requests from team members hits a queue, which is freed up every 5 minutes. It was just a wrapper over an already automated python package, which was not supporting multi-tenant use case. This was challenging as it was a multi-tenant usecase with reports generated for each execution run which had to be tagged to end users/SUTs/metrics and all together , with debugging logs required for each stage of the pipeline.

Objective and Goal of the Automation:

Concept workflow

Dev DataOps

Automated
Performance Tests Scheduling on different SUTs

SUTs –> System Under Test

The Flow chart i designed for this use-case was as below.

Automation Flow-Chart

Applications:

1. Even a not so experienced person can execute these applications and hence cost savings.

2. Sanity tests – ILO, SUT, OS , Network  etc

3. Regression tests: Early performance bugs detection across stacks

4. Optimization pipelines – RFI, RFP

5. In-house pipelines tools – python / java plugins and more….

Impact:

  1. Helped to successfully complete deals of RFP requests worth 5 to 8+ millions of US dollars. (Recent Example: SSA customer bid of 173 million win in 2 months time, an 8 year contract )

2. Fast turn around time for RFP requests and customer delight

Although i am a linux geek, but i had to learn windows batch scripting as the automation framework used for execution of the application, collect logs at system level and dump the results on a NTFS shared across the team. The initial part was to get users input defined as a key-value pair which would have SUT (system under test) IP, credentials, BIOS tunes, OS tunes etc. so the jenkins framework shown above in the flow chart would read these key-value pairs and apply the BIOS tunes, OS tunes to the SUT , followed by application execution, with the choice of the application / metric made in the UI style drop down in the jenkins framework itself.

The users would edit and copy their run parameters as a text file and copy to Input folder and jenkins framework would read these files every 5 mins and schedules the application execution on different SUTs automatically. Once the runs complete, the results are dumped in a user specific folders with tuning-comment strings to identify users who executed a particular application on a given SUT with specific tuning applied. This history of tunes and application results can be used to build machine learning models and use it for recommendation to RFP/RFI requests. However if there are errors in the execution then currently it is not handled but is notified through slack from the jenkins workflow.

Some screenshots of the automation are shown below:

Input folder with test*.dat file containing key-value pairs
Results of various systems and applications like intspeed,fpspeed fprate etc with jadmin user
The pipeline can be also launched manually with choice of different systems
jenkins pipeline used for above

Since the SUTs on which the application is launched had to be on a isolated network (for performance tests), the JVM parameters used to launch are as below, so that there is no impact of jenkins agents on the performance results.

Different users controlled through matrix based authentication in jenkins

users
Matrix based users authentication
different apps choices
slack notification plugin in jenkins to report test workflow results

Rest of the python, windows batch scripts and jenkins groovy scripts are at github link

Due paucity of time and some company policy constraints, I am unable to blog in detail but reach out to me for demo or details on the workflow and python/groovy scripts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s