The Now Platform® Washington DC release is live. Watch now!

Help
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
sergiu_panaite
ServiceNow Employee
ServiceNow Employee

If you've had a chance to review Business Rule Best Practices (Wiki link: here) then you will know an important consideration is knowing WHEN to run business rules. This post focuses on ASYNC business rules since these are commonly used to address performance issues. When used correctly, ASYNC business rules can provide an instance-wide performance boost to some transactions. However, there's a few scenarios I would like to draw your attention to. To do that let's first begin by understanding how ASYNC business rules execute in the platform.

What happens when a user action triggers an ASYNC business rule: the platform creates a scheduled job (inserts a record in sys_trigger table) from the business rule after the user submits the form and after any action is taken on the record on the database. From this point of view, the ASYNC business rules are similar to after business rules, but ASYNC rules run in the background, allowing the user to carry out without having to wait for the business rule to complete (after rules are executed synchronously, so the user has to wait for the business rule to complete before gets the control back).

Asynchronous business rules are generally fine if they run quickly. However, depending on the frequency and duration of execution, they can be detrimental. Normal Business Rules can also be detrimental if they run slowly but the effects are different. The following explains some of the differences.

  • Poorly performing Asynchronous business rules:
    • The user transaction that triggered the business rule will complete before the Business Rule is run
    • The user will be able to immediately perform a second transaction without waiting for the Business Rule to complete
    • The Business Rule will consume a Scheduler Worker (8) as long as it runs. If several users are doing similar transactions that trigger same ASYNC business rule that takes a long time to execute we could potentially end up with all Scheduler Workers running only the jobs triggered by the ASYNC business rule. In that case:
      • SLAs updates will be delayed for the period of ASYNC business rule runtime
      • Events processing will be delayed as well for the period of ASYNC business rule runtime (this in turn can cause emails delays)
      • Discovery sensors processing will be delayed
      • ...
  • Poorly performing Synchronous business rules:
    • The business rule transaction will be same as user transaction so it will consume a Default semaphore (16) as long as it runs
    • The user will not be able to perform a second transaction until the Business Rules completes and the transaction finishes
    • If several users are doing similar transactions that trigger same SYNC business rule we could potentially end up with all Default semaphores consumed (16) with transactions for that particular SYNC business rule. In that case:
      • New user transactions will be queued until a semaphores is available

To give an example of an ASYNC business rule I created one like this:

find_real_file.png

find_real_file.png

Now, when I insert/update an incident the business rule will be triggered and platform creates a scheduled job like this:

find_real_file.png

In the above print screen you can see that:

  1. Scheduled job has a name under the form: "ASYNC: <name of business rule>": "ASYNC: TestingASYNCBR" for my example
  2. In Job context we see a reference to the business rule sys_id (marked with the red rectangle)

With above information it's easy to identify what business rule has triggered this particular scheduled job.

If ASYNC business gives controls back to user immediately (great user experience), why do we need to think about performance considerations?

As these business rules run in the background, their actions can cause server side contention / performance degradation. Running in the background means they will be run under one of the available workers (each application node typically has 8 workers).

Scenario 1: ASYNC Business Rules take a long time to execute

If I consider for testing my own instance (which has only one application node) and I take the example provided previously and modify gs.sleep like this:

gs.sleep(1000000);

then do quickly more than 8 updates on incidents, I end up with the following:

find_real_file.png

Checking stats.do page (https://<myinstance>.service-now.com/stats.do) I see the following:

 


Background Scheduler
Number of workers: 8
Queue length: 12
Mean queue age: 0:01:09.989
System ID: <hostname>.ams3.service-now.com:<node_name>
Total jobs: 606747
Total burst workers: 1
Scheduler run state: true

Scheduler Workers
glide.scheduler.worker.0
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:48:22 PST 2016
Job duration: 0:01:55.764
Total jobs: 3787
Mean duration: 0:00:00.234

glide.scheduler.worker.1
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:47:49 PST 2016
Job duration: 0:02:27.832
Total jobs: 3689
Mean duration: 0:00:00.416

glide.scheduler.worker.2
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:48:10 PST 2016
Job duration: 0:02:07.411
Total jobs: 3794
Mean duration: 0:00:00.134

glide.scheduler.worker.3
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:47:50 PST 2016
Job duration: 0:02:27.740
Total jobs: 3810
Mean duration: 0:00:00.139

glide.scheduler.worker.4

Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:48:00 PST 2016
Job duration: 0:02:17.799
Total jobs: 3796
Mean duration: 0:00:00.158

glide.scheduler.worker.5
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:47:50 PST 2016
Job duration: 0:02:27.712
Total jobs: 3720
Mean duration: 0:00:00.353

glide.scheduler.worker.6
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:47:59 PST 2016
Job duration: 0:02:17.991
Total jobs: 3772
Mean duration: 0:00:00.170

glide.scheduler.worker.7
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:48:09 PST 2016
Job duration: 0:02:07.812
Total jobs: 3805
Mean duration: 0:00:00.141

 


The stats.do page gives me the following information:

  1. All workers are busy now with running instances of that ASYNC business rule
  2. Running several times stats.do shows me that "Queue length" (marked with red in above stats.do output) is slowly increasing, meaning other jobs are queued waiting to run.

Clicking on the blue link on one of the workers like:

glide.scheduler.worker.4

we get the stack trace of this job that points out what job it is and at which line code is spending the time.

Threads for: Demo Server @ <hostname>.ams3.service-now.com:<app_node_port> at: Mon Jan 04 23:49:02 PST 2016 (Refresh)
Connected to cluster node: <hostname>.ams3.service-now.com:<app_node_name>
Build date: 11-07-2015_1248
Build tag: glide-geneva-08-25-2015__patch0-hotfix1-11-06-2015

main,glide.scheduler.worker.0,4,ASYNC: TestingASYNCBR (40248 ms)

java.lang.Thread.sleep(Native Method)

com.glide.script.GlideSystem.js_sleep(GlideSystem.java:978)                

sun.reflect.GeneratedMethodAccessor637.invoke(Unknown Source)

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)                

java.lang.reflect.Method.invoke(Method.java:597)

org.mozilla.javascript.FunctionObject.doInvoke(FunctionObject.java:597)

org.mozilla.javascript.FunctionObject.call(FunctionObject.java:504)

org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)                

org.mozilla.javascript.gen.c7635.call(sys_trigger.b878bd496fb01600e5e2e0526e3ee44c:4)

org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)

org.mozilla.javascript.gen.c7636.call(sys_trigger.b878bd496fb01600e5e2e0526e3ee44c:1)

org.mozilla.javascript.gen.c7636.exec(sys_trigger.b878bd496fb01600e5e2e0526e3ee44c)

com.glide.script.ScriptEvaluator.execute(ScriptEvaluator.java:233)

com.glide.script.ScriptEvaluator.evaluateString(ScriptEvaluator.java:105)

com.glide.script.ScriptEvaluator.evaluateString(ScriptEvaluator.java:72)            

com.glide.script.Evaluator.evaluatePossiblePrefixedString(Evaluator.java:192)      

com.glide.job.RunScriptJob.evaluateScript(RunScriptJob.java:158)  

com.glide.job.RunScriptJob.runScript(RunScriptJob.java:121)                

com.glide.job.RunScriptJob.execute(RunScriptJob.java:88)

com.glide.schedule.JobExecutor.execute(JobExecutor.java:80)

com.glide.schedule.GlideScheduleWorker.executeJob(GlideScheduleWorker.java:191)

com.glide.schedule.GlideScheduleWorker.process(GlideScheduleWorker.java:135)

com.glide.schedule.GlideScheduleWorker.run(GlideScheduleWorker.java:57)

 

In my case:

org.mozilla.javascript.gen.c7635.call(sys_trigger.b878bd496fb01600e5e2e0526e3ee44c:4)

points to the job triggered by the ASYNC business rule which has the sleep call at line 4:

find_real_file.png

A quick look into sys_trigger table shows all the queued jobs:

find_real_file.png

Conclusion: the above scenario can delay the execution of one or more important scheduled jobs (including out of box ones), like "events process", "SMTP Sender", etc and this is usually translated into performance degradation. Imagine that your emails will be sent with at least 1000s delay.

Note : ASYNC jobs are deleted from the sys_trigger table after execution is complete, so gather your evidence quickly!

Scenario 2: ASYNC Business Rules get stuck

This scenario is based on real incidents opened by customers of ServiceNow.

Consider the following example of ASYNC business rule code:

find_real_file.png

Do you already see the mistake in the code?

If not, let's perform the test by updating at least 8 incidents. Checking after stats.do page I see:

 


  Background Scheduler
Number of workers: 8
Queue length: 10
Mean queue age: 0:00:30.398
System ID: <hostname>.ams3.service-now.com:<app_node_name>
Total jobs: 611108
Total burst workers: 12
Scheduler run state: true

Scheduler Workers
glide.scheduler.worker.0
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:23:11 PST 2016
Job duration: 0:00:46.601
Total jobs: 4326
Mean duration: 0:00:00.429

glide.scheduler.worker.1
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:49 PST 2016
Job duration: 0:01:08.084
Total jobs: 4243
Mean duration: 0:00:00.589

glide.scheduler.worker.2
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:40 PST 2016
Job duration: 0:01:17.778
Total jobs: 4314
Mean duration: 0:00:00.372

glide.scheduler.worker.3
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:29 PST 2016
Job duration: 0:01:27.990
Total jobs: 4357
Mean duration: 0:00:00.348

glide.scheduler.worker.4
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:40 PST 2016
Job duration: 0:01:17.517
Total jobs: 4329
Mean duration: 0:00:00.367


glide.scheduler.worker.5
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:29 PST 2016
Job duration: 0:01:28.078
Total jobs: 4256
Mean duration: 0:00:00.540

glide.scheduler.worker.6
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:40 PST 2016
Job duration: 0:01:17.703
Total jobs: 4298
Mean duration: 0:00:00.377

glide.scheduler.worker.7
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:23:00 PST 2016
Job duration: 0:00:57.652
Total jobs: 4355
Mean duration: 0:00:00.347

 


Checking stack trace for one of the workers we see:

Threads for: Demo Server @ app55035.ams3.service-now.com:16022 at: Tue Jan 05 02:23:46 PST 2016 (Refresh)
Connected to cluster node: <hostname>.ams3.service-now.com:<app_node_name>
Build date: 11-07-2015_1248
Build tag: glide-geneva-08-25-2015__patch0-hotfix1-11-06-2015

main,glide.scheduler.worker.4,4,ASYNC: TestingASYNCBR (66511 ms)

com.glide.sys.cache.CacheMan.get(CacheMan.java:83)

com.glide.sys.cache.CacheManager.get(CacheManager.java:63)

com.glide.db.TableDescriptorProvider.getSchemaFromCache(TableDescriptorProvider.java:86)

com.glide.db.TableDescriptor.getSchema(TableDescriptor.java:191)

com.glide.db.TableDescriptor.getElementDescriptor(TableDescriptor.java:355)

com.glide.db.TableDescriptor.getElementDescriptor(TableDescriptor.java:344)

com.glide.db.domain.DomainMonitor.onExecute0(DomainMonitor.java:90)

com.glide.db.domain.DomainMonitor.hackBeforeExecute(DomainMonitor.java:67)

com.glide.db.domain.DomainSupport.hackBeforeExecute(DomainSupport.java:935)                

com.glide.db.DBCompositeAction.setSystemFields(DBCompositeAction.java:331)

com.glide.db.DBCompositeAction.beforeExecute(DBCompositeAction.java:74)

com.glide.db.DBInsert.beforeExecute(DBInsert.java:481)                

com.glide.db.DBAction.executeAndReturnException(DBAction.java:165)

com.glide.script.GlideRecordITable.insert(GlideRecordITable.java:116)          

com.glide.script.GlideRecord.insert(GlideRecord.java:4392)

com.glide.script.GlideRecord.insert(GlideRecord.java:4323)                

com.glide.sys.SysLog.log(SysLog.java:183)                

com.glide.sys.SysLog.log(SysLog.java:149)                

com.glide.sys.SysLog.info(SysLog.java:33)                

com.glide.sys.SysLog.info(SysLog.java:24)                

com.glide.script.GlideSystem.jsFunction_log(GlideSystem.java:1144)                

sun.reflect.GeneratedMethodAccessor729.invoke(Unknown Source)                

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)                

java.lang.reflect.Method.invoke(Method.java:597)                

org.mozilla.javascript.FunctionObject.doInvoke(FunctionObject.java:597)                

org.mozilla.javascript.FunctionObject.call(FunctionObject.java:504)                

org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)                

org.mozilla.javascript.gen.c7659.call(sys_trigger.e6db56c16ff01600e5e2e0526e3ee44f:9)                

org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)                

org.mozilla.javascript.gen.c7660.call(sys_trigger.e6db56c16ff01600e5e2e0526e3ee44f:1)                

org.mozilla.javascript.gen.c7660.exec(sys_trigger.e6db56c16ff01600e5e2e0526e3ee44f)                

com.glide.script.ScriptEvaluator.execute(ScriptEvaluator.java:233)                

com.glide.script.ScriptEvaluator.evaluateString(ScriptEvaluator.java:105)                

com.glide.script.ScriptEvaluator.evaluateString(ScriptEvaluator.java:72)                

com.glide.script.Evaluator.evaluatePossiblePrefixedString(Evaluator.java:192)                

com.glide.job.RunScriptJob.evaluateScript(RunScriptJob.java:158)                

com.glide.job.RunScriptJob.runScript(RunScriptJob.java:121)                

com.glide.job.RunScriptJob.execute(RunScriptJob.java:88)                

com.glide.schedule.JobExecutor.execute(JobExecutor.java:80)                

com.glide.schedule.GlideScheduleWorker.executeJob(GlideScheduleWorker.java:191)                

com.glide.schedule.GlideScheduleWorker.process(GlideScheduleWorker.java:135)                

com.glide.schedule.GlideScheduleWorker.run(GlideScheduleWorker.java:57)

Stack trace points to line 9 in the ASYNC business rule which can get a bit confusing as line actually is:

gs.log('Testing');

How can this cause the job to get stuck in the first place? The error in this case is actually a syntax error on line 8, as I have not properly called gr.next function by omitting the parentheses, code required being actually:

while(gr.next()){

          //do something

}

Possible solutions to stop these stuck jobs and stop creation of new ones:

  1. Deactivate the faulty ASYNC business rule
  2. Go to User Administration -> All Active Transactions. Select each of these transactions and from "Action on select rows" choose Kill.   This might not work for you if you're logged in on one of the application nodes where jobs are not running, so you might need ServiceNow support assistance.
  3. If above doesn't work only solution in this case is to restart the application node, action done directly by ServiceNow customer support personnel (this would also involve doing step 1 and removing from sys_trigger the faulty jobs).

Conclusion: the above scenario has much more impact than previous scenario as it causes workers to be stuck forever, until application node is restarted.

General performance considerations:

  1. Make sure your ASYNC business rule is properly tested on a subproduction environment and make sure you know what response time is
  2. Make sure you validate your code (not only for ASYNC business rules, but for every script) from a syntax point of view.

If you're also interested in performance considerations when using GlideRecord you can have a look at this post:

Performance considerations when using GlideRecord

6 Comments