How to Investigate Performance Issues in Software

Performance is one of the most critical quality attributes of software systems. Often, performance-related issues are reported or detected late in the development lifecycle, making root cause analysis crucial. Without a structured approach, project teams may end up investigating random areas and applying arbitrary fixes, which often do not work.

Common Types of Performance Issues

Understanding the distinction between different performance issues is essential, as the term "performance issue" is often used broadly. Below are the key types of performance problems:

1. Response Time Issues

This occurs when an application takes too long to respond, making it difficult for users to complete their tasks efficiently. A lack of well-defined response time requirements can lead to disagreements about whether an issue exists.

For example, some companies set internal benchmarks such as:

2 seconds for login pages
9 seconds for other normal pages
2 seconds for database queries to execute at maximum

However, these are not universal standards. The acceptable response time depends on factors such as:

Network latency
Response time from dependent systems
Database performance

Common Causes of Poor Response Time:

Network latency
Hardware limitations
Software inefficiencies (e.g., poor architecture, inefficient code algorithms)

2. Scalability Issues

An application may perform well under normal load but slow down, stop responding, or crash as the number of users increases.

Some teams attempt to solve this by adding more memory or scaling horizontally, but if the root cause is poor architecture, design, or inefficient code, these solutions will only delay the inevitable.

Common Causes of Scalability Issues:

Insufficient hardware resources
Poor architecture & design
Inefficient code

3. Availability Issues

A system may crash intermittently or after a specific period. Sometimes, these issues stem from robustness problems rather than performance constraints. However, in many cases, performance problems such as memory leaks contribute to system instability.

Common Causes of Availability Issues:

Memory leaks in the code
Poor application robustness
Hardware failures
Network issues

How to Investigate Performance Issues in Software Applications

Before beginning an investigation, it is essential to ask the right questions, as performance issues reported by end users often lose clarity by the time they reach developers or architects.

Key Questions to Ask:

How many users can comfortably work on the application?
What is the maximum, minimum, and average user load? How many users are concurrent vs. total in a day?
Which specific use cases or scenarios exhibit performance issues?
What kind of performance problems are being reported (e.g., long initial loading, slow subsequent requests, server crashes under high user load)?
What are the quantitative performance metrics (e.g., page load takes 5 minutes, server crashes with 25 concurrent users)?
What are the server statistics during peak load (CPU utilization, memory usage, disk I/O, etc.)?
What performance level was the application designed for?

Since most performance issues originate from database operations and network round trips, investigations should typically begin there, assuming capacity and latency issues have already been checked.

Investigation for Response Time Issues

Step 1: Database Investigation

What to Do?

Database profiling
Static review of database queries and stored procedures

How to Do It?

Use static analysis tools
Conduct a database architecture review (queries, indexes, configurations)
Utilize DB profiling tools (e.g., SQL Profiler for MSSQL, AWR reports for Oracle)

Example Checks:

Query execution time under single-user and concurrent-user scenarios
Proper usage of indexes
Avoidance of full table scans and Cartesian products
Minimization of **SELECT * **queries

Example Recommendations:

Use indexes on frequently queried columns
Optimize joins to prevent unnecessary Cartesian products
Avoid excessive or redundant indexes

Step 2: Web/App Server-Side Investigation

What to Do?

Static code review for architecture and design flaws
Performance testing for different scenarios
Memory profiling
Monitoring system statistics (CPU, memory, etc.)

How to Do It?

Use static analysis tools
Employ profilers
Conduct an architectural review

Example Checks:

Excessive remote calls in loops
Frequent fetching of common data from the database instead of caching it
Heavy session state usage
Inefficient pagination techniques

Example Recommendations:

Use batch queries and prepared statements
Implement caching for frequently accessed data
Avoid excessive session state storage
Apply proper pagination techniques

Step 3: Client-Side Investigation

What to Do?

Static review of front-end code
Analyze network communication

How to Do It?

Use browser developer tools
Conduct architectural reviews

Example Checks:

Multiple unnecessary service calls for a single operation
Inefficient use of Angular watchers ($watch, $watchCollection)

Example Recommendations:

Use coarse-grained service calls to minimize network round trips
Optimize UI/UX frameworks by following best practices

Investigation for Scalability Issues

What to Do?

Conduct load testing
Perform an architectural review if load testing indicates bottlenecks

How to Do It?

Use load testing tools (e.g., LoadRunner, JMeter, OpenSTA)
Conduct manual small-scale user load tests for initial investigation
Perform architectural analysis

Example Recommendations:

Improve architecture (e.g., decoupling components, asynchronous processing)
Add more nodes with load balancing
Upgrade to 64-bit architecture if memory limitations exist

Fact Collection for Performance Analysis

To find the root cause and provide recommendations, an architect must collect relevant data:

Define Performance Objectives

Expected response time

S.No.	Scenario/page	Response time in seconds
1
2

Required throughput

S.No.	Req/second	Transaction/second	bytes/second
1
2

Acceptable resource utilization

S.No.	% CPU Utilisation	% Memory Utilisation	Network IO
1
2

Key performance workload scenarios

S.No.	Scenario	Description
1
2

Gather Inputs from Various Teams

Development Team: Application details and known bottlenecks

S.No.	Questions	Answer
1	What is the performance problem?
2	Is it initial loading of the page or application which is slow?
3	Is every request to a page or scenario too slow? Which scenarios?
4	Is server crashing/going down/unavailable often?
5	If answer to Q4 is yes, with how many users server crashes?
6	If answer to Q4 is yes, does it happen when specific page/scenario is accesed?
7	Is caching used in DB access layer of the application?
8	Is caching used in application layer?
9	Is application using an ORM Tool?
10	Is pagination implemented in application?
11	If pagination done, is it getting page data every time from database or session?
12	Is remoting (RMI, RemoteEJB, .NET Remoting) used in application?
13	Is application server colocated with database server in the same machine/subnetwork?
14	Is loadbalancing used in the system? Hardware or software load balancing?
15	Is table partitioning used?
16	Are there long running transactions? Payment gateway or accessing many systems?
17	Are there federated transactions?

IT Operations Team: User load and infrastructure constraints

Inputs related to user load

S.No.	Web/App/DB Server	Software /Application deployed	Application technology	Physical Memory	Processor	Type of Hard Disc	Memory Utilisation		Processor Utilisation
							Average	Peak/Max	Average	Peak
1
2

Inputs related to various systems

S.No.	Load type	Concurrent	Per day (any time in a day)	Dominating scenario/usecase
1	Maximum no of users
2	Minimum number of users
3	Average number of users (most common)
4	Maximum no of concurrent user load beyond which performance problems appear

Business Users: Feedback on performance issues

S.No.	Use Case/Scenario	Problem	Time to complete scenario
			Max	Most Common	Min
1
2

Testing Team: Performance test results

S.No.	Page/Screen/Transaction	Time to load/Response time in second
		Max	Average	Min
1
2

Database Administrators (DBAs): Query performance and indexing strategies

S.No.	Usecase	Query/Procedure	Query/Proc execution time in second
			Max	Average	Min
1
2

Conclusion

Investigating performance issues requires a structured approach, starting from database and network layers before moving to application and client-side code. A well-defined investigation strategy ensures that the root cause is identified and effective solutions are applied instead of temporary fixes.

By following best practices and using appropriate tools, software teams can diagnose and address performance bottlenecks efficiently, leading to a more stable and scalable application.

Interesting resources

https://ieeexplore.ieee.org/document/5752531

https://cdn.oreillystatic.com/en/assets/1/event/134/Forensic%20tools%20for%20in-depth%20performance%20investigation%20Presentation.pdf

https://www.datadoghq.com/blog/monitoring-101-investigation/

https://support.solarwinds.com/Success_Center/Server_Application_Monitor_(SAM)/SAM_Documentation/Server_Application_Monitor_Getting_Started_Guide/040_Monitor/Investigate_application_performance_with_Performance_Analysis

https://techbeacon.com/perfguild-5-insights-your-performance-testing-team

https://www.comparitech.com/net-admin/application-performance-management/

Mastering Trade-Off Analysis in System Architecture: A Strategic Guide for Architects

In system architecture and design, balancing conflicting system qualities is both an art and a science. Trade-off analysis is a strategic evaluation process that enables architects to make informed decisions that align with business goals and technical constraints. By prioritizing essential system attributes while acknowledging inevitable compromises, architects can craft resilient and efficient solutions. This enhanced guide provides actionable insights and recommendations for architects aiming to master trade-off analysis for impactful architectural decisions. 1. Understanding Trade-Off Analysis Trade-off analysis involves identifying and evaluating the conflicting requirements and design decisions within a system. Architects must balance critical aspects like performance, scalability, cost, security, and maintainability. Since no system can be optimized for every quality simultaneously, prioritization based on project goals is essential. Actionable Insights: Define key quality ...

Jageshwar Tripathi

Search This Blog

How to Investigate Performance Issues in Software

Common Types of Performance Issues

1. Response Time Issues

2. Scalability Issues

3. Availability Issues

How to Investigate Performance Issues in Software Applications

Key Questions to Ask:

Investigation for Response Time Issues

Step 1: Database Investigation

Step 2: Web/App Server-Side Investigation

Step 3: Client-Side Investigation

Investigation for Scalability Issues

Fact Collection for Performance Analysis

Define Performance Objectives

Gather Inputs from Various Teams

Conclusion

Interesting resources

Comments

Post a Comment

Popular posts from this blog

Virtual environments in python

Mastering Trade-Off Analysis in System Architecture: A Strategic Guide for Architects

Building a Simple Text Generator: A Hands-on Introduction