The Now Platform® Washington DC release is live. Watch now!
06-12-2020 04:27 PM - edited 02-27-2024 03:10 PM
< Previous Article | Next Article > | |
Optimizing the Service Portal Typeahead search widget | Performance Best Practice for Before Query Business Rules |
This guide is written by the ServiceNow Technical Support Performance team (All Articles). We are a global group of experts that help our customers with performance issues. If you have questions about the content of this article we will try to answer them here. However, if you have urgent questions or specific issues, please see the list of resources on our profile page: ServiceNowPerformanceGTS
The goal of this article is to address some common types of scripting issues that can cause performance problems. While ServiceNow has many low-code or no-code development options, any implementation will still require the writing of some code. Coding in ServiceNow obviously provides a great deal of power to enable rich features and to delight your customers. However, it also has the potential to cause serious issues if not properly designed.
Table of Contents
ServiceNow almost exclusively uses JavaScript as the scripting language for our back-end API's. As of Nov 30, 2023 Vancouver is the current release of ServiceNow. At this time, ServiceNow supports 3 different versions of Javascript, depending on the script context: ECMAScript 2021 (ES12), ES5 Standards, and Compatibility. To see details about the latest script environment see the product documentation [Doc Site] JavaScript modes.
On top of JavaScript, ServiceNow leverages a variety of frameworks and libraries (Angular, Prototype) as well as a variety of proprietary scripting contexts and API's, each with their own list of available context variables and implications to performance. This guide will mostly be covering what could be described as the most common scripting scenarios in ServiceNow; the (scoped) [Developer Portal] Server Custom Scope or (legacy) [Developer Portal] Server Global API's. These API's are available from most "script" type fields in ServiceNow. We will not get too in-depth into any specific contexts. The developer site has detailed information about the various API's, libraries and frameworks leveraged in different parts of ServiceNow ([Developer Portal] APIs, libraries, and supplemental materials)
For more technical best practices, see [Developer Portal] Scripting Technical Best Practices.
Many of these concepts are not original, and certainly not unique to ServiceNow, but they have been selected because they have been observed to be the areas where ServiceNow customers run into the most trouble. Ok, let's dig in!
This is #1 for a reason! Failure to correctly validate null/undefined variables is the most common cause of impactful scripting failures that we experience in ServiceNow code. When an invalid variable is used in GlideRecord the system potentially brings back way more rows than people were expecting and can result in performance degradation, data corruption or data loss.
There are two main flavors of this issue:
GlideRecord is ServiceNow's main API. It allows access to the database via script.
[Developer Portal] Free Developer Course on GlideRecord
[Doc Site] ServiceNow Server Script Debugger - Feature addition: [Developer Portal] Debugging Console available with in Paris!
The solutions:
• Check for all values before using them ([Developer Portal] coding best practice)
To avoid unpredictable results and warning messages, verify that variables and fields have a value before using them. Consider the following code:
var table = current.cmdb_ci.installed_on.sys_class_name;
gs.info('Table is: ' + table);
Any additional statements which use the variable table may throw warning messages in the system log if the value is undefined. This can happen if the cmdb_ci field is empty or if the installed_on field in the CI is empty. The following code demonstrates a better way to verify that the table variable has a value before using it
var table = current.cmdb_ci.installed_on.sys_class_name;
if (table) gs.info('Table is: ' + table);
else gs.info('Warning: table is undefined');
Other methods include:
JSUtil.notNil() - [Developer Portal] True if the item exists and is not empty.
GlideElement.nil() - [Developer Portal] Scoped GlideElement nil()
GlideSystem.nil() - [Developer Portal] Scoped GlideSystem.nil()
• Change the behavior through a property change
The property, "glide.invalid_query.returns_no_rows" changes the behavior of the case of addQuery(badFieldName, goodCondition) to return no results in very efficient fashion by adding 1=0 to the conditions. You might need to add this property to your instance to make it work. Set the value to true to get the behavior change.
var undf={};
var test1 = new GlideRecord("task");
test1.addQuery("short_description", undf.field); //Will return all records where short_description is NULL
test1.query();
gs.info(test1.getRowCount()); // 2258 on my demo
var test2 = new GlideRecord("task");
test2.addQuery(undf.field, 1); // Will return nothing and query will be super fast: SELECT ... FROM task task0 WHERE 0 = 1
test2.query();
gs.info(test2.getRowCount()); // 0
• Use the new GlideQuery API (Available in Paris)
- There is a new feature that is currently being used (primarily internally) that helps developers write better/safer ServiceNow Glide code.
- GlideQuery effectively sits on top of GlideRecord and provides additional error checking/improved syntax. As a result, for well written queries, we wouldn’t expect to see any difference in performance between the two classes as the same GlideRecord() queries should be run in both cases. During performance testing, the product team has documented some single digit millisecond performance overhead that we assume is due to the additional logic added by GlideQuery before it calls GlideRecord.
See [Doc Site] GlideQuery - Scoped, Global for official ServiceNow documentation
Large arrays or objects can cause many specific failure points including the following:
The most common problem listed above is the last one; running the JVM low on memory. The Java Virtual Machine in which ServiceNow runs has 2GB of memory total (heap space). At any given time, 20% to 80% of memory might be in use. Large implementations of ServiceNow can often stay above 60% memory-in-use for much of their business day. This means there is less available memory at any given time and developers will need to be more careful to avoid creating memory scarcity. GC floor above 80% is generally considered dangerously high.
This all begs the question, how much memory can I safely use in my code? How big is too big? As a very rough frame of reference, an array with 160,000 sys_id's will take a little under 10MB in memory or 0.5% of total memory in the system. Now suppose you have all 16 of your semaphores doing that process and 8% of total memory is locked up. That's probably a very safe number for any size customer.
As far as we are aware, there is no testing framework for ensuring efficient memory usage. One idea that we've had - but never tried - is to write a gs.sleep() statement at the very end of whatever script we want to test for memory. Then check memory before, during and after executing the script by looking at /stats.do. The part that says "Free percentage: 58.0" tells you how much available memory you have at any given time. If you did this on a sub-prod with little else running at the same time you ought to be able to get a pretty good idea of how much memory is being retained by your code.
There is an interesting thread in StackOverflow about how to test object sizes: [Stack Overflow] memory - JavaScript object size - Stack Overflow
Some more ways to avoid out-of-memory are listed further down in this article:
#5 Limit number of returned records when querying very large tables
#6 Running out of memory due storing dot-walked GlideRecord fields in arrays
#8 Infinite or Very Large Loops and Recursion
There are many potential ways that you could avoid excessive memory due to large arrays or objects. One way (perhaps less well known and perhaps less useful) is to de-reference unneeded objects after they are used. Javascript will attempt to Garbage clean any variable that is not accessible from root. If you have a very long script that builds multiple large arrays, it might be worth it to de-reference any arrays as soon as they are no longer needed.
Here is a script example demonstrating how Javascript handles de-referencing and garbage collection.
var o = {a: 1, b:2 }; // memory retained in object o
var oa = o.a;
var o = null; // the only reference to the object originally stored in o is gone, but we have stored a reference to o.a, one of the properties, so we can't garbage collect yet.
var oa = null; // now we can garbage collect
See Reference Counting Garbage Collection
This issue can manifest in many ways and it would be impossible to define what is "too much" or "too quickly" in every context in this simple community article. However, it is still an important topic and therefore we will attempt to address it by giving an analogy and then providing some simple examples from Service Portal that demonstrate the rule.
Here's an analogy: You might think of too much work/too quickly as a "pipe problem", where you are trying to get particles through a pipe without clogging the pipe. If the particles are big then you can get fewer of them through the pipe at the same time, if the particles are smaller then you can get more through the pipe, but you must watch out for how many you send at the same time. Ultimately you must figure out the following things:
Once you answer these questions it is a matter of designing your solution to stay well within the available capacity of the pipe.
Here is a list of some common areas where folks run into the "too much work, too quickly" issue:
Example #1 Contextual Search
Contextual Search automatically fires off an AJAX call to the server whenever someone enters characters in a text box. On the server side a text search operation looks for any matching results for the text that was entered in the text box and returns those results. In order to do that, each text search query is multi-threaded at the app layer, meaning that a single text search operation can trigger up to 5 database queries at once. If a few users all do this at once, the database can quickly come under heavy load, especially if the text search implementation is not well tuned. Most database servers in ServiceNow's farm (as of July 31, 2020) have between 32 and 48 CPU's and if enough users are executing contextual search at the same time, the "pipe" of the database server can get clogged. To avoid this, if you are implementing Contextual Search on a frequently accessed page, like in your ServicePortal Service Catalog, you should reduce the frequency that Contextual Search will fire and improve the execution time of the search operations.
See [KB0813266] Tuning Contextual Search (CXS) for details on optimizing Contextual Search.
Example #2 Too Broadly Defined Record Watchers
Record Watchers are a type of messaging trigger in ServiceNow's backend that allow a publish/subscribe model that is server-driven. When the platform registers a change to a record that meets the conditions of a Record Watcher, all subscribers are "notified" of the change. The more broadly a Record Watcher is defined, the more frequently it will trigger notifications. Each notification causes a transaction between the client and server of any user who is subscribed. Further, if a broadly defined Record Watcher appears on many pages then you could have the makings of a potential transaction storm. For example, suppose that you build a Record Watcher that looks for any changes to a shopping cart (sc_cart) and you put this Record Watcher in your Service Catalog header. See the problem? We didn't say only the current user's shopping cart! What if 100 users all decide to put something in their carts at the same time? This is not that far fetched. It is based on actual customer scenarios where, for example, a catalog item was advertised during a company-wide meeting. The result is that each user will get reload from every other user. Thus they will all experience 100 refreshes to their UI. That means 100 users with 100 refreshes, so a total of 10,000 transactions suddenly sent to ServiceNow all at once. This will certainly cause some performance degradation to each individual user who is on the Service Catalog, but it could also overload system resources, thus causing system-wide performance degradation.
Make sure record watchers adhere to the recommendations in KB0639111 and all queries should adhere to the recommendations in the following community article about performance best practice for efficient query filters: [Community] Performance Best Practice for Efficient Queries - Top 10 Practices
Also see [Community] Are your auto-refreshing widgets causing instance slowdowns?
During the course of handling an end user transaction, server-side JavaScript can be used for many things such as executing business logic or even interacting with an external endpoint. What is important, however, is to make sure that this code is executed in a manner which is as optimal as possible since the end user is waiting for a response.
The platform allows execution of code both synchronously (do it right now and wait for the results) and asynchronously (do it at some point in the future, generally within the next few seconds, and don't wait). When writing code that is triggered via a transaction, think about whether that transaction actually needs the results of the code in order to complete. If it doesn't, then you are probably better off executing your code asynchronously outside of the transaction. This gives best transactional performance, and if the transaction is triggered via a user, keeps your users happy.
Example #1
For example let's say you need to trigger some business logic when data is modified in a particular table - the obvious way with which to do this is via a business rule - what is less obvious, however, is how/when the business rule should run. In short there are three options:
If the business rule must execute and complete within the context of the triggering transaction (i.e. everything should be atomic or remaining processing performed by the transaction depends on the results of the business rule) then it makes sense to use before or after. The drawback, however, is that these are synchronous and therefore run within the context of the triggering transaction, meaning that the transaction takes longer to execute (especially if your code is slow). If its OK for the business rule to execute outside of the triggering transaction (for example once the transaction has completed) use Async - the platform will automatically execute the code via a scheduled job on a background worker thread meaning your users don't have to wait. (see [Community] Performance considerations when using ASYNC Business Rules)
Example #2
As a second example, let's consider that a transaction needs to interact with an external endpoint via REST. Again, if the transaction needs data from the endpoint before it can complete (for example the transaction is a search request which pulls results from somewhere outside of ServiceNow) then the communication has to be performed synchronously. Network communications are, however, inherently slow, meaning that the triggering transaction will likely take much longer to execute. This is especially true if the endpoint has some kind of issue, meaning it is unexpectedly slow or doesn't respond at all. If the interaction is 'fire and forget' or even if you need to process results but can do this outside of the context of the triggering transaction, then asynchronous communications are a much better solution. Asynchronous REST can be triggered via the RESTMessageV2().executeAsync() method.
However, be VERY careful to understand the implications of using executeAsync() as it does incur performance overhead due to using the scheduled job queue and becomes synchronous if used together with waitForResponse(), completely defeating the purpose!!
If nothing needs to be done with the results of the REST request processing can stop there. If, however, you want to use the results (for example to update data/drive other business logic or even implement some kind of retry mechanism) you can configure a business rule on ecc_queue which looks for records where "queue = 'input' and topic = '[a suitable topic]' and state = 'ready'" to handle this.
See the article below for a complete explanation of performance implications of working with RESTMessageV2 and SOAPMessageV2 methods.:
[KB0694711] Outbound REST Web Services RESTMessageV2 and SOAPMessageV2 execute() vs executeAsync() B...
Fundamentally the less synchronous processing you can perform, the faster transactions will complete. This has a whole range of benefits:
NOTE: You might be wondering about why we have not included a note about synchronous vs. asynchronous AJAX. Technically the method that controls synchronous AJAX is on the browser and therefore is better suited for a client-side article. However, there is an excellent article by Mark Roethof (Community MVP) published here: Go for GlideAjax (with getXMLAnswer)!
For example, this:
var gr = new GlideRecord('incident');
gr.addActiveQuery();
gr.query();
while (gr.next()){
// do something here
}
might work fine if you have only a few thousands active incidents, but if you have 1,000,000 active incidents, the query() method has to retrieve all those records and this can take time.
In cases like this, using the setLimit() method is essential.
var gr = new GlideRecord('incident');
gr.addActiveQuery();
gr.setLimit(100);
gr.query();
while (gr.next()){
// do something here
}
In other cases you might just want to check whether any there are any records at all that match a certain query. For example:
var gr = new GlideRecord('incident');
gr.addActiveQuery();
gr.query();
if (gr.next()) {
// do something here
}
This is even worse - note that the above code does nothing with the results of the query - instead of iterating over them it simply tests whether any results were returned. The query() method, however, will cause details of all matching records to be streamed from the database to the application node; which can:
To avoid this, use setLimit(1) so that at most a single record will be returned (if there are matches) or no records will be returned (if there are no matches):
var gr = new GlideRecord('incident');
gr.addActiveQuery();
gr.setLimit(1);
gr.query();
if (gr.next()) {
// do something here
}
Another case that this can happen is when someone uses the [Developer Portal] GlideRecord.get() method. The purpose of this method is to return one matching record given some query criteria that is assumed to result in a unique match. One might assume that the platform is optimized to handle cases where the get() method is used against a data set that returns more than one result, but you would be wrong.
For example, consider the following script:
var incidentGr = new GlideRecord("incident");
incidentGr.get("active", "true");
gs.info(incidentGr.sys_id);
The output of the script will be something like this (but a different "sys_id"):
*** Script: 1682fd87378c1300023d57d543990e2e
Great, it returned one result, right? Yeah, it returned one result, but two queries were needed to get that one result - known internally as a two-pass method. The first query actually queried and returned the sys_id of every single matching value from the database and stored them in a temporary flat file on the operating system and then the second query returned the first 100 matching records by sys_id and arbitrarily populated your GlideRecord with the first matching record. That is not very efficient. Turning on "System Diagnostics > Debug SQL (Detailed)" would result in something like the following:
14:16:16.700 Time: 0:00:00.002 for: myinstance_2[glide.2] [-73339228] SELECT ... FROM task task0 WHERE task0.`sys_class_name` = 'incident' AND task0.`active` = 1 /*...*/
14:16:16.720 Time: 0:00:00.002 for: myinstance_2[glide.6] [896098098] SELECT ... FROM task task0 WHERE task0.`sys_class_name` = 'incident' AND task0.`sys_id` IN ('1682fd87378c1300023d57d543990e2e' , '4fe6310f378c1300023d57d543990ec7' , 'a40739cb378c1300023d57d543990e73' , ...97 other sys_id's...)
It would be much better to GlideRecord.query() with a setLimit(1) clause unless you were absolutely convinced that your query could not possibly match more than one record.
One issue that happens often is when someone stores an object reference in an array when they really only need to store a primitive data type. They might do this unintentionally or they might not realize the impact it is having. For example, the following code stores a GlideElement object when only a simple String type is needed.
var arrIncident = [];
var myIncident = new GlideRecord("incident");
myIncident.query();
while(myIncident._next()) {
arrIncident.push(myIncident.caller_id); // This is passing a GlideElement
}
A GlideElement object type is actually a Java datatype that has been exposed to Javascript through the GlideRecord API. GlideElement is many times larger than a simple 32 character String sys_id - it is in the range of 10's of kilobytes for each. By contrast, ServiceNow Javascript stores Strings as UTF-16, so a good estimate for size of a string would be the total character length x 2 bytes. Since each sys_id has 32 characters in it, the storage size would be a meager 64 bytes.
Using the GlideRecord.getValue(String) method is a good way to avoid using complex data types since it will return the simple Javascript String values instead of complex Java objects.
var arrIncident = [];
var myIncident = new GlideRecord("incident");
myIncident.query();
while(myIncident._next()) {
if (!myIncident.caller_id.nil())
arrIncident.push(myIncident.getValue("caller_id")); // This passes just the sys_id as a String. Much better!
}
N.B. You might notice that we have used _next() instead of next() in some of our code examples, while this isn't a performance best practice, it is a way to avoid a conflict is the given table has a field name called "next". There are not many tables with a field name "next", so this is a very rare edge case.
In ServiceNow the general naming convention is to use a meaningful two or three word variable name in camel case, starting with a lowercase letter. Here are some examples:
parentCI - for the Configuration Item (CI) that is the parent of another CI.
resolvedIncidents - could be a GlideRecord query of all incidents in the "Resolved" state
standardSchedule - could mean the 8-5 workday schedule excluding holidays
loopCount - use this instead of "i" to run a loop
Using unique variable names has many advantages
• It makes code manageable. Suppose you have an error message that says "variable gr is not defined". How are you going to figure out what script that comes from? Try a global search on "gr" - you will find hundreds of unrelated results. (Yes, we know, many of those results are out-of-box code - we are working on it!)
• It makes code easy to understand. This is not just for your benefit. Be nice to the next developer who comes along - they might be able to figure out where you live.
• It helps avoid your variables being stomped on by other scripts - in Javascript if you call down into another function, the variables that are local to your current closure will be accessible from the function that is being called. This is perhaps the most insidious side effect of using common variable names. When your variable is stomped on by another script it is hard to troubleshoot because suddenly your "i" variable goes from 100 to 7 without any explanation. Rename common variables like "gr", "ga", "i" to more unique names. Commonly used variable names have a tendency to get stomped on by variables in other scopes.
References
[Community - non ServiceNow] code snippets variable and function naming
[Developer Portal] Scripting Technical Best Practices
This can happen a lot in the CMDB or task tables when you want to use the parent/child relationships to check the whole ancestry of a record. When doing this type of operation you run the risk of getting into a very deep or infinite loop - record A has child B has child C has child A (uh oh!). Lots has been written about how to avoid this type of thing, so we won't go into depth in this article. Here are some ideas:
Be careful while implementing that last safety measure above though, as you might end up creating a huge memory object and end up worse than when you started. For example, suppose are trying to avoid infinite recursion by keeping a list of items that have already been seen. If that list gets too big then you could run the instance out of memory. So, make sure that list is optimized to have a small memory footprint and put a limit on how big that list can grow.
One scripting pattern that can cause slowness is getting aggregate information on large data sets. There are a number of ways to accomplish this task and you should consider which is the best for your use case.
For very large data sets or frequently accessed operations, you should always use a pre-aggregated method. There are various features in ServiceNow that support pre-aggregated approaches. The most obvious one and the most powerful is [Doc Site] Performance Analytics. By storing aggregated values in a separate data structure, you can avoid the expensive overhead of executing real-time aggregations over and over.
In terms of simple JavaScript coding for lightweight or infrequent aggregations you can use the GlideAggregate method.
Danger: By default these methods result in grouped and sorted queries - sometimes it does this even when you only want one result, like when using MAX. Watch out for cases where you don't need to group or sort and use the setGroup(false) or setOrder(false) methods appropriately.
var countActiveIncidents = new GlideAggregate("incident");
countActiveIncidents.addActiveQuery();
countActiveIncidents.addAggregate("COUNT","assignment_group");
countActiveIncidents.setOrder(false);// having an inefficient ORDER BY here might slow things down. If you don't need it, use setOrder(false)
countActiveIncidents.query();
var countResults = [];
while(countActiveIncidents.next()) {
countResults.push(countActiveIncidents.getDisplayValue("assignment_group") + ": " + countActiveIncidents.getAggregate("COUNT","assignment_group"));
}
gs.info(countResults);
var newestIncident = new GlideAggregate("incident");
newestIncident.addActiveQuery();
newestIncident.addAggregate("MAX","sys_created_on");
newestIncident.setGroup(false);// always do this when using MAX/MIN, it will implicitly stop the order by as well so you don't need to add setOrder(false)
newestIncident.query();
var maxResult = 0;
if(newestIncident.next()) {
maxResult = newestIncident.getAggregate("MAX","sys_created_on");
}
gs.info(maxResult);
Additional references:
[KB0745198] How to get the Top 10 values from a table using the GlideAggregate function
[KB0852541] How to get the MIN, MAX, AVG and SUM values from a table using the GlideAggregate functi...
If you only want the count, then do not use getRowCount(). Use GlideAggregate COUNT option instead. The reason you shouldn't use GlideRecord.getRowCount() if you only want count is that it will require a query that will stream every single matching result before it can do the count. However, if your code needs to check the count and it also needs to loop through all the GlideRecord results, then this is the way to go.
NOTE: Under the covers, GlideRecord works on a 2-pass query method.
var countActiveIncidents = new GlideRecord("incident");
countActiveIncidents.addActiveQuery();
countActiveIncidents.query();
gs.info(countActiveIncidents.getRowCount());
The only case where you might want to use this method is if, for some reason, it is more efficient to loop through a set of matching results and count only those results that match certain criteria. For example, perhaps the query to get the counts of different assignment groups that match a specific filter (e.g. u_my_field > 20) is really slow and you want to avoid the slow query by looping through all the results and applying the filter with code.
var countActiveIncidents = new GlideRecord("incident");
countActiveIncidents.addActiveQuery();
countActiveIncidents.query();
var count = 0;
while(countActiveIncidents._next()) {
if (Number(countActiveIncidents.getValue("u_my_field")) > 20) {
count++;
}
}
gs.info(count);
The GlideRecord.get method has two documented method signatures. It can be called with one argument, "value", or with two arguments, "name" and "value". However, in the backend, there is really only one method that accepts two arguments, "name" and "value".
It works like this:
As you can see, calling the one argument GlideRecord.get method can trigger a lot of queries that you might not intend. To avoid these unintended queries, use the two argument version of GlideRecord.get and validate that you are passing a valid "name" and "value".
Finally, as mentioned earlier in this article, using GlideRecord.get initiates a two-pass query, potentially returning all matching records. If you only want to return a single record, you should be using GlideRecord.setLimit(1) with GlideRecord.query().
NOTE: Some of the content in this Community article was taken directly from Sergiu's excellent blog post (with permission, of course), [Community] Performance considerations when using GlideRecord
< Previous Article | Next Article > | |
Optimizing the Service Portal Typeahead search widget | Performance Best Practice for Before Query Business Rules |
Awesome
We have updated this page with a new section #9 - Aggregation. Also, check out the link to Brad Tilton's informative video about the new Console feature for the Script Debugger in Paris.
Not directly connected to the content of this page, but... I'm looking for information on the performance impact, if any, of calling a function within a large script include that includes many functions. Does anyone have any insights on this?
I don't know how ServiceNow & Rhino work for running server-side Javascript (and would be really interested to know more). E.g. is the code compiled down to Java, or interpreted. But I'm thinking that when I call a function in a large script include, ServiceNow/Rhino has to parse the code to find where that function starts. Which must take some time. And if it's being done in a loop (for example, during execution of a transform map processing lots of records), that feels like it's potentially quite an overhead.
The basic question is whether it's OK to have fewer larger script includes, or better to have more smaller script includes?
I did log a case in HI which came back with some information including that scripts can sometimes be compiled down to Java, or sometimes to pseudocode (pcode) which is then executed interpretatively. But that details of how all this works is internal to ServiceNow. They did advise that "compiling thousands of short scripts should be avoided because it tends to cause the system to run out of permgen / metaspace". Which suggests it's better to have fewer but longer script includes. But that then returns to the question of the overhead of parsing longer scripts in order to find a component function to execute.
I'm hoping this is the right place to be asking this question, but if not then please could you let me know if there's somewhere else (other than HI) where I could ask it.
Thanks
Michael
Hi Michael,
Since you've already asked in HI and I can't think of any additional resources that I could point you to, I'll ask around to see what tips we can provide here.
(Would you rather face 100 duck-size horses or 1 horse-size duck?)
Caveat: This isn't the sort of thing that most customers should have to worry about, but obviously it is something that you've come across and I am also aware of a few cases where other customers have hit performance issues around this area. It is few and far between.
I should get back to you in the next day or so.
Hi Michael,
Apologies for taking so long to reply. I actually wrote this up a month ago and thought I had responded to your question at that time but just noticed I never did. After consideration, the best answer we can give is that it is better to have a smaller number of large script includes than it is to have a greater number of small script includes. Here's why:
Large, well refactored scripts are better for development and code maintenance. In discussion in our team, the few times we have encountered performance problems with code compilation, it has been scenarios with many, many small scripts (e.g. tens of thousands of scripted SLA conditions). In these cases we have seen scenarios where the volume of code compilation ends up driving excessive sweeping, flushing type activity. The flushing then causes secondary or tertiary impacts that cause performance degradation.
This all begs the question, what is meant by "large" and "small"?
From a coding standard perspective there are many view points on this. e.g. The "Rule of 30" - Each class no more than 30 methods, each methods no more than 30 lines of code and therefore each Script Include, no more than 900 lines of code. I think that's a pretty good rule of thumb.
From the Javascript caching perspective, we only cache "small" bits of code anyway. It might change in the future, but to give a general idea of how small we are talking, at the moment any code more than 64KB gets interpreted anyway. By the way, if each line of a 900 line script include had 100 characters in it, it would be 175k (assuming it works the way I think it does and each character would be 2 bytes - UTF-16).
We hope that is somewhat helpful!
ServiceNowPerformanceGTS
Many thanks for coming back on this. I'm generally interested in code quality and performance, and how things actually work 'under the hood'. It's encouraging to know that a smaller number of larger scripts is preferable. I think I'd struggle to get down to 30 lines of code per method though :-(.
Even just from a usage point of view, I'd have thought it must be far better to have associated methods grouped together into suitably-named scripts, so that it's easier for someone to see what's already available and avoid recreating them unnecessarily.
Thanks again, and yes it's definitely helpful!
Paris is here! We've added a note with the reference to the new GlideQuery script API.
Added a note in the "Too much work, too quickly" section. We often see customers who have built items in their custom ServicePortal to show a user a list of "My Work" or "My Team's Work" in a header, footer or menu that loads on every page. This can become very problematic if the queries being used are inefficient (see Performance Best Practice for Efficient Queries - Top 10 Practices). "Inefficient" in this context might mean a query that takes 100 milliseconds depending on the nature of the query. The reason you have to be so careful in this context is that ServicePortal refreshes the header and footer often for various reasons (record watchers, page navigation etc.) and thus a "My Work" or "My Assets" type of feature can quickly overload the system. If these queries cannot be instantaneous (i.e. less than 10 milliseconds) then they should not be automatically executed in the UI for every ServicePortal page load.
Instead, consider ways to load these features on-demand. Wait until a user hovers over the menu or clicks on a link before automatically executing the queries. More complex solutions might include using some type of a caching mechanism or creating a gating mechanism that will not re-run the same query for the same user more than once a minute.
We have updated this article with additional links to information about efficient use of GlideAggregate.
Added a note about performance implications of the method GlideRecord.get(String, String)
More technical best practices can be found on the Developer portal.
Thanks for that helpful callout, Chuck! You're a legend, sir.
We've just put a new coat of paint on this article (particularly in item #1 avoiding null/undefined variables) and we hope that you will bookmark it and return often. The content is dense and it covers a lot of ground, but we promise it is worth taking the time to understand these concepts deeply 🙂 We recommend that you implement these items into your code review practices and best practice guides.
Another item that we've seen from time to time is the use of gs.sleep(integer). This method freezes the executing thread. Sometimes folks use this to try to avoid race conditions when there are asynchronous processes running. For example, they might want to freeze a Script Action (sys_script_action table) for 10 seconds before trying to query for a certain record because that record might have been created in a separate thread and they want to ensure that the record has been created before they query for it. The problem with this design is that freezing any thread means that any other potential work waiting for that thread incurs wait time. For example, freezing a Script Action means that the scheduled worker thread that is processing the Event Processor job that is processing the Script Action will now be frozen. Event Processor jobs are often responsible for processing thousands of events (sysevent table) per second, and freezing the job for 10 seconds may create a major bottleneck in the Event Queue. As a general rule, you should never use gs.sleep(integer) except for testing purposes. An alternative method to using gs.sleep is to schedule some work for the future rather than sleeping the thread. Scheduling work for the future can be done a few different ways. For example, you can use the API gs.eventQueueScheduled(String name, Object instance, String parm1, String parm2, Object expiration) to schedule an event that should be processed in the future.
See [Official Dev Site] GlideSystem.eventQueueScheduled API