Performance is one of the most critical quality attributes of software systems. Often, performance-related issues are reported or detected late in the development lifecycle, making root cause analysis crucial. Without a structured approach, project teams may end up investigating random areas and applying arbitrary fixes, which often do not work.
Common Types of Performance Issues
Understanding the distinction between different performance issues is essential, as the term "performance issue" is often used broadly. Below are the key types of performance problems:
1. Response Time Issues
This occurs when an application takes too long to respond, making it difficult for users to complete their tasks efficiently. A lack of well-defined response time requirements can lead to disagreements about whether an issue exists.
For example, some companies set internal benchmarks such as:
2 seconds for login pages
9 seconds for other normal pages
2 seconds for database queries to execute at maximum
However, these are not universal standards. The acceptable response time depends on factors such as:
Network latency
Response time from dependent systems
Database performance
Common Causes of Poor Response Time:
Network latency
Hardware limitations
Software inefficiencies (e.g., poor architecture, inefficient code algorithms)
2. Scalability Issues
An application may perform well under normal load but slow down, stop responding, or crash as the number of users increases.
Some teams attempt to solve this by adding more memory or scaling horizontally, but if the root cause is poor architecture, design, or inefficient code, these solutions will only delay the inevitable.
Common Causes of Scalability Issues:
Insufficient hardware resources
Poor architecture & design
Inefficient code
3. Availability Issues
A system may crash intermittently or after a specific period. Sometimes, these issues stem from robustness problems rather than performance constraints. However, in many cases, performance problems such as memory leaks contribute to system instability.
Common Causes of Availability Issues:
Memory leaks in the code
Poor application robustness
Hardware failures
Network issues
How to Investigate Performance Issues in Software Applications
Before beginning an investigation, it is essential to ask the right questions, as performance issues reported by end users often lose clarity by the time they reach developers or architects.
Key Questions to Ask:
How many users can comfortably work on the application?
What is the maximum, minimum, and average user load? How many users are concurrent vs. total in a day?
Which specific use cases or scenarios exhibit performance issues?
What kind of performance problems are being reported (e.g., long initial loading, slow subsequent requests, server crashes under high user load)?
What are the quantitative performance metrics (e.g., page load takes 5 minutes, server crashes with 25 concurrent users)?
What are the server statistics during peak load (CPU utilization, memory usage, disk I/O, etc.)?
What performance level was the application designed for?
Since most performance issues originate from database operations and network round trips, investigations should typically begin there, assuming capacity and latency issues have already been checked.
Investigation for Response Time Issues
Step 1: Database Investigation
What to Do?
Database profiling
Static review of database queries and stored procedures
How to Do It?
Use static analysis tools
Conduct a database architecture review (queries, indexes, configurations)
Utilize DB profiling tools (e.g., SQL Profiler for MSSQL, AWR reports for Oracle)
Example Checks:
Query execution time under single-user and concurrent-user scenarios
Proper usage of indexes
Avoidance of full table scans and Cartesian products
Minimization of **SELECT * **queries
Example Recommendations:
Use indexes on frequently queried columns
Optimize joins to prevent unnecessary Cartesian products
Avoid excessive or redundant indexes
Step 2: Web/App Server-Side Investigation
What to Do?
Static code review for architecture and design flaws
Performance testing for different scenarios
Memory profiling
Monitoring system statistics (CPU, memory, etc.)
How to Do It?
Use static analysis tools
Employ profilers
Conduct an architectural review
Example Checks:
Excessive remote calls in loops
Frequent fetching of common data from the database instead of caching it
Heavy session state usage
Inefficient pagination techniques
Example Recommendations:
Use batch queries and prepared statements
Implement caching for frequently accessed data
Avoid excessive session state storage
Apply proper pagination techniques
Step 3: Client-Side Investigation
What to Do?
Static review of front-end code
Analyze network communication
How to Do It?
Use browser developer tools
Conduct architectural reviews
Example Checks:
Multiple unnecessary service calls for a single operation
Inefficient use of Angular watchers ($watch, $watchCollection)
Example Recommendations:
Use coarse-grained service calls to minimize network round trips
Optimize UI/UX frameworks by following best practices
Investigation for Scalability Issues
What to Do?
Conduct load testing
Perform an architectural review if load testing indicates bottlenecks
How to Do It?
Use load testing tools (e.g., LoadRunner, JMeter, OpenSTA)
Conduct manual small-scale user load tests for initial investigation
Perform architectural analysis
Example Recommendations:
Improve architecture (e.g., decoupling components, asynchronous processing)
Add more nodes with load balancing
Upgrade to 64-bit architecture if memory limitations exist
Fact Collection for Performance Analysis
To find the root cause and provide recommendations, an architect must collect relevant data:
Define Performance Objectives
Expected response time
S.No. | Scenario/page | Response time in seconds |
1 | ||
2 |
Required throughput
S.No. | Req/second | Transaction/second | bytes/second |
1 | |||
2 |
Acceptable resource utilization
S.No. | % CPU Utilisation | % Memory Utilisation | Network IO |
1 | |||
2 |
Key performance workload scenarios
S.No. | Scenario | Description |
1 | ||
2 |
Gather Inputs from Various Teams
Development Team: Application details and known bottlenecks
S.No. | Questions | Answer |
1 | What is the performance problem? | |
2 | Is it initial loading of the page or application which is slow? | |
3 | Is every request to a page or scenario too slow? Which scenarios? | |
4 | Is server crashing/going down/unavailable often? | |
5 | If answer to Q4 is yes, with how many users server crashes? | |
6 | If answer to Q4 is yes, does it happen when specific page/scenario is accesed? | |
7 | Is caching used in DB access layer of the application? | |
8 | Is caching used in application layer? | |
9 | Is application using an ORM Tool? | |
10 | Is pagination implemented in application? | |
11 | If pagination done, is it getting page data every time from database or session? | |
12 | Is remoting (RMI, RemoteEJB, .NET Remoting) used in application? | |
13 | Is application server colocated with database server in the same machine/subnetwork? | |
14 | Is loadbalancing used in the system? Hardware or software load balancing? | |
15 | Is table partitioning used? | |
16 | Are there long running transactions? Payment gateway or accessing many systems? | |
17 | Are there federated transactions? |
IT Operations Team: User load and infrastructure constraints
S.No. | Web/App/DB Server | Software /Application deployed | Application technology | Physical Memory | Processor | Type of Hard Disc | Memory Utilisation | Processor Utilisation | ||
Average | Peak/Max | Average | Peak | |||||||
1 | ||||||||||
2 | |
S.No. | Load type | Concurrent | Per day (any time in a day) | Dominating scenario/usecase |
1 | Maximum no of users | |||
2 | Minimum number of users | |||
3 | Average number of users (most common) | |||
4 | Maximum no of concurrent user load beyond which performance problems appear |
Business Users: Feedback on performance issues
S.No. | Use Case/Scenario | Problem | Time to complete scenario | ||
Max | Most Common | Min | |||
1 | |||||
2 |
Testing Team: Performance test results
S.No. | Page/Screen/Transaction | Time to load/Response time in second | ||
Max | Average | Min | ||
1 | ||||
2 |
Database Administrators (DBAs): Query performance and indexing strategies
S.No. | Usecase | Query/Procedure | Query/Proc execution time in second | ||
Max | Average | Min | |||
1 | |||||
2 |
Conclusion
Investigating performance issues requires a structured approach, starting from database and network layers before moving to application and client-side code. A well-defined investigation strategy ensures that the root cause is identified and effective solutions are applied instead of temporary fixes.
By following best practices and using appropriate tools, software teams can diagnose and address performance bottlenecks efficiently, leading to a more stable and scalable application.
Interesting resources
https://ieeexplore.ieee.org/document/5752531
https://cdn.oreillystatic.com/en/assets/1/event/134/Forensic%20tools%20for%20in-depth%20performance%20investigation%20Presentation.pdf
https://www.datadoghq.com/blog/monitoring-101-investigation/
https://support.solarwinds.com/Success_Center/Server_Application_Monitor_(SAM)/SAM_Documentation/Server_Application_Monitor_Getting_Started_Guide/040_Monitor/Investigate_application_performance_with_Performance_Analysis
https://techbeacon.com/perfguild-5-insights-your-performance-testing-team
https://www.comparitech.com/net-admin/application-performance-management/
Comments
Post a Comment