Performance Data Analysis- R scripts

In this blog, i would like to introduce some of the simple Rstudio Rscripts i have been using for performance data analysis from the data obtained using turbostat tool.  Turbostat is a Linux command-line utility which provides not only all the cores frequency but also c-states status during performance runs of various configuration runs done during our Benchmarking exercise.

Those who are not familiar with Benchmarking world, let me explain things in a nutshell.  When E-commerce/Banking/Insurance/other financial companies are trying to fuel their business with more customers, they upgrade their hardware and thus look for best servers in the market to buy something which is value for their money.  Best servers, meaning the ones which provide optimal performance or price performance of the industry standard benchmarks for instance from consortium such as SPEC (www.spec.org). These servers could, in turn, belong to on-premise or cloud infrastructure.

Let me introduce with further no due wait, the R scripts I am using for the performance data analysis. In the future, I plan to provide this as a dashboard, so that it becomes as available as a service to whoever wants to use it for their turbostat raw data.

You should have R and Rstudio both installed on your laptop. Google or just refer here https://www.ics.uci.edu/~sternh/courses/210/InstallingRandRStudio.pdf

Note that you have to first install R and then install Rstudio.

R script:

1) library(readr)

uses the readr package required for reading the give dataset into Rstudio
2)  library(“ggplot2″)

this is used for visualization you want to plot
3) turbostat_data <- read_table2(file.choose(),col_names=TRUE,na=”NA”)

i am creating a dataset above to plot it’s columns
4) summary(turbostat_data)

See below Rstudio trace logs, which shows some of the statistics of the given dataset (each column).
5) cpu <-subset(turbostat_data,select = c(Core,Avg_MHz,`Busy%`))

6) colnames(cpu) y-axis
7) x x-axis
8) ggplot(cpu, aes(x=x, y=cpu$Avg_MHz, colour = Core)) + geom_line() + labs(title=”Workload sles15_dl380-SUT_16GB_DIMM_2GHz_2dpc Core opert Frequency vs Time”, caption = “Source: Trim_turbostat-16gbdimm-2GHz-_2dpc”, subtitle=”redis”)
9)ggsave(“\\Graphs\\1-jan-2019rom-Workload_sles15_Standalone_Server_16GB_DIMM_2GHz_2dpc_Core_Freq_vs_time.jpeg”, plot = last_plot(), device = NULL)

Above saves the graphs in given path on your laptop where you have R and Rstudio installed
10)cpudata <-subset(turbostat_data,`Busy%` <100, select = c(Core,Avg_MHz,`Busy%`))
11) colnames(cpudata) <- c(“Core”,”Avg_MHz”,”cpuBusy”)
12)p <-seq (1,5,length =nrow(cpudata))
13) ggplot(cpudata, aes(x=p, y=cpudata$cpuBusy, colour = Core)) + geom_line() + labs(title=”Workload sles15_dl380-SUT_16GB_DIMM_2GHz_2dpc Core Utilizations vs Time”, caption = “Source: Trim_turbostat-16gbdimm-2GHz-_2dpc”, subtitle=”redis”)
14) ggsave(“Graphs\\1-jan-2019rom-Workload_sles15_Standalone_Server_16GB_DIMM_2GHz_2dpc_Core_Util_vs_time.jpeg”, plot = last_plot(), device = NULL)
15) p<-seq(1,5,length =nrow(turbostat_data))
16) ggplot(turbostat_data, aes(x=p, y=turbostat_data$CoreTmp, colour = Core)) + geom_line() + labs(title=”Workload sles15_dl380-SUT_16GB_DIMM_2GHz_2dpc Core Temp vs Time”, caption = “Source: Trim_turbostat-16gbdimm-2GHz-_2dpc”, subtitle=”redis”)
17) ggsave(“Graphs\\1-jan-2019rom-Workload_sles15_Standalone_Server_16GB_DIMM_2GHz_2dpc_Core_Temp_vs_time.jpeg”, plot = last_plot(), device = NULL)

Visualizations look something like below.

1-jan-2019rom-Workload_sles15_Standalone_Server_16GB_DIMM_2933_2dpc_Core_Temp_vs_time1-jan-2019rom-Workload_sles15_Standalone_Server_16GB_DIMM_2933_2dpc_Core_Util_vs_time1-jan-2019rom-Workload_sles15_Standalone_Server_16GB_DIMM_2933_2dpc_Core_Freq_vs_time

First graph from top to bottom, provides details on core temperatures over workload execution, whereas second and third graphs show how cores utilization, cores frequency vary as workload execution progresses.

So how can we use this for performance analysis?
In core frequency vs workload execution graph, we can check if cores frequency is as expected from processor specifications or else raise alarm to ROM team to look into the issues.

Another observation we can have is to check cores utilization / SUT (system under test) utilization by the workload. If not close to 80%+, need to check why so, as it might be related to hardware or software stack performance bottleneck. One can write scripts for many use-cases as this and keep them as template and reuse it for regression analysis.

I know there are so many visualization tools available to do this job, but with R and Rstudio the advantage is you can get these scripts ready and get the plots very quickly compared to excel.  This depends on individual preference and the one tool end user is comfortable with. Underlining idea is to understand performance anomalies and identify issues early.

Future work:

I am working on integrating this onto jupyterLab and make it configurable for any dataset with 0<1GB < 1TB data size, using python. I will keep posted on the same. Hope this is useful.

Disclaimer:

The postings on this site are my own and do not necessarily represent my employer’s positions, strategies, or opinions.

Shreeharsha GN

2 thoughts on “Performance Data Analysis- R scripts

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s