Skip to main content

How to Investigate Performance Issues in Software

 Performance is one of the most critical quality attributes of software systems. Often, performance-related issues are reported or detected late in the development lifecycle, making root cause analysis crucial. Without a structured approach, project teams may end up investigating random areas and applying arbitrary fixes, which often do not work.

Common Types of Performance Issues

Understanding the distinction between different performance issues is essential, as the term "performance issue" is often used broadly. Below are the key types of performance problems:

1. Response Time Issues

This occurs when an application takes too long to respond, making it difficult for users to complete their tasks efficiently. A lack of well-defined response time requirements can lead to disagreements about whether an issue exists.

For example, some companies set internal benchmarks such as:

  • 2 seconds for login pages

  • 9 seconds for other normal pages

  • 2 seconds for database queries to execute at maximum

However, these are not universal standards. The acceptable response time depends on factors such as:

  • Network latency

  • Response time from dependent systems

  • Database performance

Common Causes of Poor Response Time:

  • Network latency

  • Hardware limitations

  • Software inefficiencies (e.g., poor architecture, inefficient code algorithms)

2. Scalability Issues

An application may perform well under normal load but slow down, stop responding, or crash as the number of users increases.

Some teams attempt to solve this by adding more memory or scaling horizontally, but if the root cause is poor architecture, design, or inefficient code, these solutions will only delay the inevitable.

Common Causes of Scalability Issues:

  • Insufficient hardware resources

  • Poor architecture & design

  • Inefficient code

3. Availability Issues

A system may crash intermittently or after a specific period. Sometimes, these issues stem from robustness problems rather than performance constraints. However, in many cases, performance problems such as memory leaks contribute to system instability.

Common Causes of Availability Issues:

  • Memory leaks in the code

  • Poor application robustness

  • Hardware failures

  • Network issues


How to Investigate Performance Issues in Software Applications

Before beginning an investigation, it is essential to ask the right questions, as performance issues reported by end users often lose clarity by the time they reach developers or architects.

Key Questions to Ask:

  • How many users can comfortably work on the application?

  • What is the maximum, minimum, and average user load? How many users are concurrent vs. total in a day?

  • Which specific use cases or scenarios exhibit performance issues?

  • What kind of performance problems are being reported (e.g., long initial loading, slow subsequent requests, server crashes under high user load)?

  • What are the quantitative performance metrics (e.g., page load takes 5 minutes, server crashes with 25 concurrent users)?

  • What are the server statistics during peak load (CPU utilization, memory usage, disk I/O, etc.)?

  • What performance level was the application designed for?

Since most performance issues originate from database operations and network round trips, investigations should typically begin there, assuming capacity and latency issues have already been checked.


Investigation for Response Time Issues

Step 1: Database Investigation

What to Do?

  • Database profiling

  • Static review of database queries and stored procedures

How to Do It?

  • Use static analysis tools

  • Conduct a database architecture review (queries, indexes, configurations)

  • Utilize DB profiling tools (e.g., SQL Profiler for MSSQL, AWR reports for Oracle)

Example Checks:

  • Query execution time under single-user and concurrent-user scenarios

  • Proper usage of indexes

  • Avoidance of full table scans and Cartesian products

  • Minimization of **SELECT * **queries

Example Recommendations:

  • Use indexes on frequently queried columns

  • Optimize joins to prevent unnecessary Cartesian products

  • Avoid excessive or redundant indexes

Step 2: Web/App Server-Side Investigation

What to Do?

  • Static code review for architecture and design flaws

  • Performance testing for different scenarios

  • Memory profiling

  • Monitoring system statistics (CPU, memory, etc.)

How to Do It?

  • Use static analysis tools

  • Employ profilers

  • Conduct an architectural review

Example Checks:

  • Excessive remote calls in loops

  • Frequent fetching of common data from the database instead of caching it

  • Heavy session state usage

  • Inefficient pagination techniques

Example Recommendations:

  • Use batch queries and prepared statements

  • Implement caching for frequently accessed data

  • Avoid excessive session state storage

  • Apply proper pagination techniques

Step 3: Client-Side Investigation

What to Do?

  • Static review of front-end code

  • Analyze network communication

How to Do It?

  • Use browser developer tools

  • Conduct architectural reviews

Example Checks:

  • Multiple unnecessary service calls for a single operation

  • Inefficient use of Angular watchers ($watch, $watchCollection)

Example Recommendations:

  • Use coarse-grained service calls to minimize network round trips

  • Optimize UI/UX frameworks by following best practices


Investigation for Scalability Issues

What to Do?

  • Conduct load testing

  • Perform an architectural review if load testing indicates bottlenecks

How to Do It?

  • Use load testing tools (e.g., LoadRunner, JMeter, OpenSTA)

  • Conduct manual small-scale user load tests for initial investigation

  • Perform architectural analysis

Example Recommendations:

  • Improve architecture (e.g., decoupling components, asynchronous processing)

  • Add more nodes with load balancing

  • Upgrade to 64-bit architecture if memory limitations exist

Fact Collection for Performance Analysis

To find the root cause and provide recommendations, an architect must collect relevant data:

Define Performance Objectives

  • Expected response time

S.No. Scenario/page   Response time in seconds
 1  
 2  
  • Required throughput

S.No. Req/second Transaction/second bytes/second
 1   
 2   
  • Acceptable resource utilization

S.No. % CPU Utilisation % Memory Utilisation Network IO
 1   
 2   
  • Key performance workload scenarios

 S.No. Scenario Description
 1  
 2  

Gather Inputs from Various Teams

  • Development Team: Application details and known bottlenecks

S.No.QuestionsAnswer
1What is the performance problem?
2Is it initial loading of the page or application which is slow?
3Is every request to a page or scenario too slow? Which scenarios?
4Is server crashing/going down/unavailable often?
5If answer to Q4 is yes, with how many users server crashes?
6If answer to Q4 is yes, does it happen when specific page/scenario is accesed?
7Is caching used in DB access layer of the application?
8Is caching used in application layer?
9Is application using an ORM Tool?
10Is pagination implemented in application?
11If pagination done, is it getting page data every time from database or session?
12Is remoting (RMI, RemoteEJB, .NET Remoting) used in application?
13Is application server colocated with database server in the same machine/subnetwork?
14Is loadbalancing used in the system? Hardware or software load balancing?
15Is table partitioning used?
16Are there long running transactions? Payment gateway or accessing many systems?
17Are there federated transactions?

  • IT Operations Team: User load and infrastructure constraints

Inputs related to user load
S.No.Web/App/DB ServerSoftware
/Application deployed
Application technologyPhysical MemoryProcessorType of Hard DiscMemory UtilisationProcessor Utilisation

AveragePeak/MaxAveragePeak
 1          
 2          


Inputs related to various systems
S.No.Load typeConcurrentPer day (any time in a day)Dominating scenario/usecase
 1 Maximum no of users   
 2Minimum number of users  
 3 Average number of users (most common)   
 4 Maximum no of concurrent user load beyond which performance problems appear   
  • Business Users: Feedback on performance issues

S.No.Use Case/ScenarioProblemTime to complete scenario
MaxMost CommonMin
1
 2     
  • Testing Team: Performance test results

S.No.Page/Screen/TransactionTime to load/Response time in second
MaxAverageMin
 1    
 2    
  • Database Administrators (DBAs): Query performance and indexing strategies

S.No.UsecaseQuery/ProcedureQuery/Proc execution time in second
MaxAverageMin
 1     
 2     

Conclusion

Investigating performance issues requires a structured approach, starting from database and network layers before moving to application and client-side code. A well-defined investigation strategy ensures that the root cause is identified and effective solutions are applied instead of temporary fixes.

By following best practices and using appropriate tools, software teams can diagnose and address performance bottlenecks efficiently, leading to a more stable and scalable application.


Interesting resources

https://ieeexplore.ieee.org/document/5752531

https://cdn.oreillystatic.com/en/assets/1/event/134/Forensic%20tools%20for%20in-depth%20performance%20investigation%20Presentation.pdf

https://www.datadoghq.com/blog/monitoring-101-investigation/

https://support.solarwinds.com/Success_Center/Server_Application_Monitor_(SAM)/SAM_Documentation/Server_Application_Monitor_Getting_Started_Guide/040_Monitor/Investigate_application_performance_with_Performance_Analysis

https://techbeacon.com/perfguild-5-insights-your-performance-testing-team

https://www.comparitech.com/net-admin/application-performance-management/

Comments

Popular posts from this blog

Virtual environments in python

 Creating virtual environments is essential for isolating dependencies and ensuring consistency across different projects. Here are the main methods and tools available, along with their pros, cons, and recommendations : 1. venv (Built-in Python Virtual Environment) Overview: venv is a lightweight virtual environment module included in Python (since Python 3.3). It allows you to create isolated environments without additional dependencies. How to Use: python -m venv myenv source myenv/bin/activate # On macOS/Linux myenv\Scripts\activate # On Windows Pros: ✅ Built-in – No need to install anything extra. ✅ Lightweight – Minimal overhead compared to other tools. ✅ Works across all platforms . ✅ Good for simple projects . Cons: ❌ No dependency management – You still need pip and requirements.txt . ❌ Not as feature-rich as other tools . ❌ No package isolation per project directory (requires manual activation). Recommendation: Use venv if you need a simple, lightweight solut...

Building a Simple Text Generator: A Hands-on Introduction

Introduction Text generation is one of the most exciting applications of Natural Language Processing (NLP) . From autocorrect and chatbots to AI-generated stories and news articles , text generation models help machines produce human-like text. In this blog post, we’ll introduce a simple yet effective text generation method using Markov Chains . Unlike deep learning models like GPT, this approach doesn’t require complex neural networks—it relies on probability-based word transitions to create text. We’ll walk through: ✅ The concept of Markov Chains and how they apply to text generation. ✅ A step-by-step implementation , fetching Wikipedia text and training a basic text generator. ✅ Example outputs and future improvements. The Concept of Markov Chains in Text Generation A Markov Chain is a probabilistic model that predicts future states (or words) based only on the current state (or word), rather than the full sentence history. How it works in text generation: 1️⃣ We analyze a gi...

Mastering Trade-Off Analysis in System Architecture: A Strategic Guide for Architects

 In system architecture and design, balancing conflicting system qualities is both an art and a science. Trade-off analysis is a strategic evaluation process that enables architects to make informed decisions that align with business goals and technical constraints. By prioritizing essential system attributes while acknowledging inevitable compromises, architects can craft resilient and efficient solutions. This enhanced guide provides actionable insights and recommendations for architects aiming to master trade-off analysis for impactful architectural decisions. 1. Understanding Trade-Off Analysis Trade-off analysis involves identifying and evaluating the conflicting requirements and design decisions within a system. Architects must balance critical aspects like performance, scalability, cost, security, and maintainability. Since no system can be optimized for every quality simultaneously, prioritization based on project goals is essential. Actionable Insights: Define key quality ...