The Now Platform® Washington DC release is live. Watch now!
If you've had a chance to review Business Rule Best Practices (Wiki link: here) then you will know an important consideration is knowing WHEN to run business rules. This post focuses on ASYNC business rules since these are commonly used to address performance issues. When used correctly, ASYNC business rules can provide an instance-wide performance boost to some transactions. However, there's a few scenarios I would like to draw your attention to. To do that let's first begin by understanding how ASYNC business rules execute in the platform.
What happens when a user action triggers an ASYNC business rule: the platform creates a scheduled job (inserts a record in sys_trigger table) from the business rule after the user submits the form and after any action is taken on the record on the database. From this point of view, the ASYNC business rules are similar to after business rules, but ASYNC rules run in the background, allowing the user to carry out without having to wait for the business rule to complete (after rules are executed synchronously, so the user has to wait for the business rule to complete before gets the control back).
Asynchronous business rules are generally fine if they run quickly. However, depending on the frequency and duration of execution, they can be detrimental. Normal Business Rules can also be detrimental if they run slowly but the effects are different. The following explains some of the differences.
To give an example of an ASYNC business rule I created one like this:
Now, when I insert/update an incident the business rule will be triggered and platform creates a scheduled job like this:
In the above print screen you can see that:
With above information it's easy to identify what business rule has triggered this particular scheduled job.
If ASYNC business gives controls back to user immediately (great user experience), why do we need to think about performance considerations?
As these business rules run in the background, their actions can cause server side contention / performance degradation. Running in the background means they will be run under one of the available workers (each application node typically has 8 workers).
Scenario 1: ASYNC Business Rules take a long time to execute
If I consider for testing my own instance (which has only one application node) and I take the example provided previously and modify gs.sleep like this:
gs.sleep(1000000);
then do quickly more than 8 updates on incidents, I end up with the following:
Checking stats.do page (https://<myinstance>.service-now.com/stats.do) I see the following:
Background Scheduler
Number of workers: 8
Queue length: 12
Mean queue age: 0:01:09.989
System ID: <hostname>.ams3.service-now.com:<node_name>
Total jobs: 606747
Total burst workers: 1
Scheduler run state: true
Scheduler Workers
glide.scheduler.worker.0
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:48:22 PST 2016
Job duration: 0:01:55.764
Total jobs: 3787
Mean duration: 0:00:00.234
glide.scheduler.worker.1
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:47:49 PST 2016
Job duration: 0:02:27.832
Total jobs: 3689
Mean duration: 0:00:00.416
glide.scheduler.worker.2
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:48:10 PST 2016
Job duration: 0:02:07.411
Total jobs: 3794
Mean duration: 0:00:00.134
glide.scheduler.worker.3
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:47:50 PST 2016
Job duration: 0:02:27.740
Total jobs: 3810
Mean duration: 0:00:00.139
glide.scheduler.worker.4
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:48:00 PST 2016
Job duration: 0:02:17.799
Total jobs: 3796
Mean duration: 0:00:00.158
glide.scheduler.worker.5
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:47:50 PST 2016
Job duration: 0:02:27.712
Total jobs: 3720
Mean duration: 0:00:00.353
glide.scheduler.worker.6
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:47:59 PST 2016
Job duration: 0:02:17.991
Total jobs: 3772
Mean duration: 0:00:00.170
glide.scheduler.worker.7
Current job: ASYNC: TestingASYNCBR
Job started: Mon Jan 04 23:48:09 PST 2016
Job duration: 0:02:07.812
Total jobs: 3805
Mean duration: 0:00:00.141
The stats.do page gives me the following information:
Clicking on the blue link on one of the workers like:
glide.scheduler.worker.4
we get the stack trace of this job that points out what job it is and at which line code is spending the time.
Threads for: Demo Server @ <hostname>.ams3.service-now.com:<app_node_port> at: Mon Jan 04 23:49:02 PST 2016 (Refresh)
Connected to cluster node: <hostname>.ams3.service-now.com:<app_node_name>
Build date: 11-07-2015_1248
Build tag: glide-geneva-08-25-2015__patch0-hotfix1-11-06-2015
main,glide.scheduler.worker.0,4,ASYNC: TestingASYNCBR (40248 ms)
java.lang.Thread.sleep(Native Method)
com.glide.script.GlideSystem.js_sleep(GlideSystem.java:978)
sun.reflect.GeneratedMethodAccessor637.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
org.mozilla.javascript.FunctionObject.doInvoke(FunctionObject.java:597)
org.mozilla.javascript.FunctionObject.call(FunctionObject.java:504)
org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)
org.mozilla.javascript.gen.c7635.call(sys_trigger.b878bd496fb01600e5e2e0526e3ee44c:4)
org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)
org.mozilla.javascript.gen.c7636.call(sys_trigger.b878bd496fb01600e5e2e0526e3ee44c:1)
org.mozilla.javascript.gen.c7636.exec(sys_trigger.b878bd496fb01600e5e2e0526e3ee44c)
com.glide.script.ScriptEvaluator.execute(ScriptEvaluator.java:233)
com.glide.script.ScriptEvaluator.evaluateString(ScriptEvaluator.java:105)
com.glide.script.ScriptEvaluator.evaluateString(ScriptEvaluator.java:72)
com.glide.script.Evaluator.evaluatePossiblePrefixedString(Evaluator.java:192)
com.glide.job.RunScriptJob.evaluateScript(RunScriptJob.java:158)
com.glide.job.RunScriptJob.runScript(RunScriptJob.java:121)
com.glide.job.RunScriptJob.execute(RunScriptJob.java:88)
com.glide.schedule.JobExecutor.execute(JobExecutor.java:80)
com.glide.schedule.GlideScheduleWorker.executeJob(GlideScheduleWorker.java:191)
com.glide.schedule.GlideScheduleWorker.process(GlideScheduleWorker.java:135)
com.glide.schedule.GlideScheduleWorker.run(GlideScheduleWorker.java:57)
In my case:
org.mozilla.javascript.gen.c7635.call(sys_trigger.b878bd496fb01600e5e2e0526e3ee44c:4)
points to the job triggered by the ASYNC business rule which has the sleep call at line 4:
A quick look into sys_trigger table shows all the queued jobs:
Conclusion: the above scenario can delay the execution of one or more important scheduled jobs (including out of box ones), like "events process", "SMTP Sender", etc and this is usually translated into performance degradation. Imagine that your emails will be sent with at least 1000s delay.
Note : ASYNC jobs are deleted from the sys_trigger table after execution is complete, so gather your evidence quickly!
Scenario 2: ASYNC Business Rules get stuck
This scenario is based on real incidents opened by customers of ServiceNow.
Consider the following example of ASYNC business rule code:
Do you already see the mistake in the code?
If not, let's perform the test by updating at least 8 incidents. Checking after stats.do page I see:
Background Scheduler
Number of workers: 8
Queue length: 10
Mean queue age: 0:00:30.398
System ID: <hostname>.ams3.service-now.com:<app_node_name>
Total jobs: 611108
Total burst workers: 12
Scheduler run state: true
Scheduler Workers
glide.scheduler.worker.0
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:23:11 PST 2016
Job duration: 0:00:46.601
Total jobs: 4326
Mean duration: 0:00:00.429
glide.scheduler.worker.1
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:49 PST 2016
Job duration: 0:01:08.084
Total jobs: 4243
Mean duration: 0:00:00.589
glide.scheduler.worker.2
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:40 PST 2016
Job duration: 0:01:17.778
Total jobs: 4314
Mean duration: 0:00:00.372
glide.scheduler.worker.3
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:29 PST 2016
Job duration: 0:01:27.990
Total jobs: 4357
Mean duration: 0:00:00.348
glide.scheduler.worker.4
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:40 PST 2016
Job duration: 0:01:17.517
Total jobs: 4329
Mean duration: 0:00:00.367
glide.scheduler.worker.5
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:29 PST 2016
Job duration: 0:01:28.078
Total jobs: 4256
Mean duration: 0:00:00.540
glide.scheduler.worker.6
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:22:40 PST 2016
Job duration: 0:01:17.703
Total jobs: 4298
Mean duration: 0:00:00.377
glide.scheduler.worker.7
Current job: ASYNC: TestingASYNCBR
Job started: Tue Jan 05 02:23:00 PST 2016
Job duration: 0:00:57.652
Total jobs: 4355
Mean duration: 0:00:00.347
Checking stack trace for one of the workers we see:
Threads for: Demo Server @ app55035.ams3.service-now.com:16022 at: Tue Jan 05 02:23:46 PST 2016 (Refresh)
Connected to cluster node: <hostname>.ams3.service-now.com:<app_node_name>
Build date: 11-07-2015_1248
Build tag: glide-geneva-08-25-2015__patch0-hotfix1-11-06-2015
main,glide.scheduler.worker.4,4,ASYNC: TestingASYNCBR (66511 ms)
com.glide.sys.cache.CacheMan.get(CacheMan.java:83)
com.glide.sys.cache.CacheManager.get(CacheManager.java:63)
com.glide.db.TableDescriptorProvider.getSchemaFromCache(TableDescriptorProvider.java:86)
com.glide.db.TableDescriptor.getSchema(TableDescriptor.java:191)
com.glide.db.TableDescriptor.getElementDescriptor(TableDescriptor.java:355)
com.glide.db.TableDescriptor.getElementDescriptor(TableDescriptor.java:344)
com.glide.db.domain.DomainMonitor.onExecute0(DomainMonitor.java:90)
com.glide.db.domain.DomainMonitor.hackBeforeExecute(DomainMonitor.java:67)
com.glide.db.domain.DomainSupport.hackBeforeExecute(DomainSupport.java:935)
com.glide.db.DBCompositeAction.setSystemFields(DBCompositeAction.java:331)
com.glide.db.DBCompositeAction.beforeExecute(DBCompositeAction.java:74)
com.glide.db.DBInsert.beforeExecute(DBInsert.java:481)
com.glide.db.DBAction.executeAndReturnException(DBAction.java:165)
com.glide.script.GlideRecordITable.insert(GlideRecordITable.java:116)
com.glide.script.GlideRecord.insert(GlideRecord.java:4392)
com.glide.script.GlideRecord.insert(GlideRecord.java:4323)
com.glide.sys.SysLog.log(SysLog.java:183)
com.glide.sys.SysLog.log(SysLog.java:149)
com.glide.sys.SysLog.info(SysLog.java:33)
com.glide.sys.SysLog.info(SysLog.java:24)
com.glide.script.GlideSystem.jsFunction_log(GlideSystem.java:1144)
sun.reflect.GeneratedMethodAccessor729.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
org.mozilla.javascript.FunctionObject.doInvoke(FunctionObject.java:597)
org.mozilla.javascript.FunctionObject.call(FunctionObject.java:504)
org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)
org.mozilla.javascript.gen.c7659.call(sys_trigger.e6db56c16ff01600e5e2e0526e3ee44f:9)
org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)
org.mozilla.javascript.gen.c7660.call(sys_trigger.e6db56c16ff01600e5e2e0526e3ee44f:1)
org.mozilla.javascript.gen.c7660.exec(sys_trigger.e6db56c16ff01600e5e2e0526e3ee44f)
com.glide.script.ScriptEvaluator.execute(ScriptEvaluator.java:233)
com.glide.script.ScriptEvaluator.evaluateString(ScriptEvaluator.java:105)
com.glide.script.ScriptEvaluator.evaluateString(ScriptEvaluator.java:72)
com.glide.script.Evaluator.evaluatePossiblePrefixedString(Evaluator.java:192)
com.glide.job.RunScriptJob.evaluateScript(RunScriptJob.java:158)
com.glide.job.RunScriptJob.runScript(RunScriptJob.java:121)
com.glide.job.RunScriptJob.execute(RunScriptJob.java:88)
com.glide.schedule.JobExecutor.execute(JobExecutor.java:80)
com.glide.schedule.GlideScheduleWorker.executeJob(GlideScheduleWorker.java:191)
com.glide.schedule.GlideScheduleWorker.process(GlideScheduleWorker.java:135)
com.glide.schedule.GlideScheduleWorker.run(GlideScheduleWorker.java:57)
Stack trace points to line 9 in the ASYNC business rule which can get a bit confusing as line actually is:
gs.log('Testing');
How can this cause the job to get stuck in the first place? The error in this case is actually a syntax error on line 8, as I have not properly called gr.next function by omitting the parentheses, code required being actually:
while(gr.next()){
//do something
}
Possible solutions to stop these stuck jobs and stop creation of new ones:
Conclusion: the above scenario has much more impact than previous scenario as it causes workers to be stuck forever, until application node is restarted.
General performance considerations:
If you're also interested in performance considerations when using GlideRecord you can have a look at this post:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.