For any consumer-facing application, site speed is an important element of the overall user experience, which extends to all platforms where your app is available, including the web, iOS, and Android. Studies have shown that site speed impacts user engagement, revenue, and other key business metrics. What steps can you put in place to ensure your page load time (PLT) stays within the desired threshold?
It’s important to note that performance has a tendency to degrade over time—this is something we’ve experienced firsthand at LinkedIn. These degradations are not usually dramatic, single-event occurrences; after all, steps like A/B testing or canary analysis are designed to catch those types of flaws prior to production rollout. Rather, site slowdowns tend to occur over a period of time due to smaller-scale latency leaks: little changes in code, browser configurations, or other areas that add up over time.
The following steps can help you identify and plug latency leaks so that your site performance stays consistent and the user experience remains positive.
Build good back-end monitoring
This should be the first line of defense against performance degradations and is also probably the most commonly implemented. When developers are writing code, they should build in back-end monitoring to measure their code’s performance in a production environment. This helps catch issues that can be solved with code-design optimizations like caching and capacity planning.
While this step is important as a baseline, it generally isn’t sufficient by itself to catch all performance degradations because it doesn’t measure client-side activities such as page rendering. To fully understand the user experience, you need to move outside the data center walls.
Synthetic client monitoring
Synthetic client monitoring is a service offered by third-party providers that test your application across a variety of controlled devices, in various configurations, to help catch client-side issues. It can be useful as an indicator of client-side site speed problems because it gives you an idea of how your code performs on real devices.
However, for a large-scale application like LinkedIn, the shortcoming of this approach is that it is, in fact, synthetic—meaning that testing is done over a limited number of devices, networks, and use cases. For instance, how does a feature like People You May Know, which draws on data from each member’s knowledge graph, perform for a member with 100 connections versus a member with 5,000 connections? The latter graph has many more edges—will site speed suffer as a result? These are the types of questions that can be difficult to answer with synthetic client monitoring.
Real user monitoring (RUM)
Even if you start at a smaller scale, such as 20 percent of all users, RUM is still useful because it tells you how your application is performing in the real world, and how users are experiencing it. When using RUM, it’s important to remember that short-term performance data is inherently noisy, so it’s often most effective to look at something like performance at the 50th or 90th percentile. And any time you’re working with raw data, it’s also imperative to ensure that appropriate privacy and security measures are in place to protect users’ information.
Advanced monitoring activities
Once you have a solid monitoring system in place, including RUM, you can engage in more advanced activities to optimize site speed. These include things like building an automated alerting system for performance regressions, conducting multidimensional root cause analyses, and running exploratory optimization tests.
For example, we once had a situation at LinkedIn where page-load time increased suddenly for a certain percentage of members. By delving into RUM data to conduct a thorough root-cause analysis, we found that the degradation was only experienced on a certain browser. After determining that a small change on the browser side was responsible, we were able to work with engineers from that browser to solve the issue.
In terms of optimizations, advanced monitoring also gives you the option to segment performance experience so you can tailor optimizations for specific user groups. For instance, in markets that are predominantly mobile-first where users might experience slower connection speeds, we developed LinkedIn Lite, a version of our web app with specific features like server-side rendering to improve performance.
Building a performance culture
Of course, none of this work is possible if you don’t have an established performance culture in your organization. It’s important to devise performance baselines and key metrics that are agreed upon by all teams, so that everyone has the same definition of success. From there, ensure organizational buy-in on the importance of performance so that everyone understands its value and the pivotal role it plays in user experience. Crucially, this can’t be a strictly top-down or bottom-up approach; rather, the value of good site speed should be embraced both at the executive level and throughout the engineering organization. Site speed should be a business priority, and engineering teams should be equipped to carry out the work of maintaining it.
By creating a performance-centric culture and implementing a robust monitoring system, you can plug latency leaks and ensure that your site remains fast.
This article is published as part of the IDG Contributor Network. Want to Join?