Outbound Integrations Using SOAP / REST: Performance Best Practices

GTSPerformance · ‎10-15-2020

< Previous Article		Next Article >
Improving Slow OR and JOIN Queries		Performance Maintenance and Administration

Background

This guide is written by the ServiceNow Technical Support Performance team (All Articles). We are a global group of experts that help our customers with performance issues. If you have questions about the content of this article we will try to answer them here. However, if you have urgent questions or specific issues, please see the list of resources on our profile page: ServiceNowPerformanceGTS

ServiceNow implementations often need to interact with external systems via an outbound integration using either Simple Object Access Protocol (SOAP) or Representational State Transfer (REST) Web APIs. The ServiceNow platform allows such integrations to be built in a number of different ways, e.g.:

Synchronous or asynchronous
Direct from application nodes or via a MID server

See our official documentation here:

https://docs.servicenow.com/bundle/tokyo-application-development/page/integrate/web-services/referen...

Getting the design of an integration correct can be hugely important. Poor design can impact performance, cause end user frustration, or even lead to service affecting events within an instance. This article discusses the various options available along with the pros and cons of each approach – hopefully this makes deciding how to develop outbound integrations that little bit easier.

Using a MID server

All outbound integrations require either a SOAP or REST ‘message’ to be created then corresponding functions/methods defined via the sys_soap_message or sys_rest_message tables (navigate to ‘System Web Services -> Outbound’). The majority of configuration of these ‘messages’ depends on details of the external web service (so is not covered here) however it is worth mentioning MID servers.

Within each function (SOAP) or method (REST) there is a field allowing a MID server (or MID cluster) to be specified, i.e.:

If this field is populated then the integration will run via a MID server (i.e. the MID server will be responsible for physically communicating with the external web service). This can have important ramifications:

If the external endpoint is not publicly accessibly (i.e. it can only be reached from within a given environment) the integration cannot run directly from ServiceNow application nodes – in this case a MID server located within the corresponding environment has to be used
Even if the external endpoint is publicly accessible it may be desirable to run the integration via MID servers to take some load away from the instance/application nodes – again, in this case, a MID server should be specified
If no MID server is provided and the field is left blank the integration will, by default, run directly from application nodes within the instance

Note that the MID server specified in the function or method can be overridden at runtime via use of the ‘setMIDServer()’ method.

Synchronous vs asynchronous

Once SOAP/REST messages (and corresponding functions/methods) have been defined they can be specified within APIs (such as ‘RESTMessageV2’) to trigger use of the integration. Both SOAP and REST APIs can be executed in two ways:

Synchronously (via the ‘execute()’ method)
- Blocks the corresponding thread until a response is received from the endpoint or a timeout occurs
- Only uses the ECC queue (ecc_queue) table during processing if the request is executed via a MID server
- Any response can be processed via the same thread which initiated the web service call
Asynchronously (via the ‘executeAsync()’ method)
- Generally does not block the initial thread – delivery of the outbound message to the external endpoint is initiated on a secondary thread, then the initial thread continues running (note that there are some caveats here as explained below)
- Always uses the ECC queue (ecc_queue) table during processing, regardless of whether a MID is used or not
- Ideally, responses should be processed via the same thread which initiated the web service call

Timeouts

Requests to external endpoints are subject to various timeouts:

Connection timeout (defaults to 10 seconds): The maximum time an outbound request will wait to establish a connection with the external endpoint. Controlled via the ‘glide.http.connection_timeout’ system property and specified in milliseconds
Timeout (defaults to 175 seconds): The maximum time to wait before an outbound request is timed out. Controlled via the ‘glide.http.timeout’ system property and specified in milliseconds – can be overridden on a per request basis via use of the ‘setHttpTimeout([milliseconds])’ method
ECC response timeout (defaults to 30 seconds): ECC timeout is how long getStatusCode, getBody, or waitForResponse will wait for a response when the executeAsync() method has been used. Note that the ECC timeout will NOT impact the timeout of the HTTP web service request itself nor the timeout of the ecc response handling when setEccTopic() or setEccCorrelator() are being used!
ECC timeout is influenced by a number of factors:
1. 1. The properties "glide.rest.outbound.ecc_response.timeout" and "glide.soap.outbound.ecc_response.timeout" respectively control the specific timeouts for REST and SOAP requests that use executeAsync (i.e., requests that are processed via ECC).
  2. If the property "glide.http.outbound.max_timeout.enabled" is set to true (as it is by default), then a max ECC timeout will be enforced by the property "glide.http.outbound.max_timeout". This max ECC timeout is 30 seconds by default and cannot be set to greater than 30 seconds.
  3. The above properties can be overwritten by the value passed in to waitForResponse(int timeToWaitMs) which receives its arguments in milliseconds. However, the timeout still cannot be greater than 30 seconds as long as "glide.http.outbound.max_timeout.enabled" is set to true.
Note: Waiting for a response after using executeAsync, as described above, is not ideal. It effectively makes the system synchronous again. The initiating thread must wait for the asynchronous thread to complete.

Synchronous requests have some obvious benefits. When triggered, they block the thread waiting for a response from the endpoint. Once this is received, it can be immediately processed. Due to this, they can sometimes be used to populate elements of a user interface with some type of external data. Note, however, that this can be extremely dangerous and should be avoided where possible:

Any latency dealing with an external endpoint directly translates to performance degradation within the initiating thread. If users start spending significant amounts of time waiting for a response to their interactions with the UI, they can quickly become upset.
Each synchronous request corresponds to a blocked thread within the instance. This can quickly cause contention and, in some cases, can cause a ServiceNow instance to become completely inoperable. This is not a good trade off for a slightly richer user interface.

If synchronous requests must be used, ensure that:

The number of synchronous requests is kept as low as possible.
The external endpoint is generally extremely fast to respond, and timeouts are short.

The combination of these factors won’t guarantee issues around synchronous requests are avoided. However, they will help safeguard against disaster. As a final point, if a synchronous request has to be used to populate the UI, consider notifying users that they are waiting for an external system (via use of some kind of icon that appears on the form) - at least they will then have some idea why the instance seems slow.

Graphical examples

Now that we understand the different options available for outbound web service design, we can step through some graphical examples showing exactly how they operate. The examples below all reference the ‘RESTMessageV2()’ API. However the ‘SOATMessageV2()’ API is functionally identical.

Synchronous requests directly from application nodes

Benefits:

Design is extremely simple - single line of code to initiate request and get returned a response
Response can be processed by the initiating thread

Drawbacks:

Causes the initiating thread to stall waiting for a response
Can quickly cause performance issues / resource contention if the endpoint is slow or many outbound requests are made simultaneously
Does not scale well

Asynchronous requests directly from application nodes with no response handling

Benefits:

Avoids stalling the initiating thread – avoids end user impact if triggered via a user interface operation

Drawbacks:

Subject to lag between scheduled job being created and picked up by an application node / executed on a worker thread
Still stalls a worker thread – a large volume of simultaneous requests to a slow endpoint could cause a backlog in scheduled job processing (scheduler overload)
No way in which results can be handled – purely for fire and forget messages – note that this can be overcome via use of ‘setEccTopic()’ method (shown below)

Asynchronous requests directly from application nodes with response handling

Benefits:

Avoids stalling the initiating thread – avoids end user impact if triggered via a user interface operation
Allows responses to be processed via a business rule

Drawbacks:

Subject to two sets of lag occurring between scheduled jobs being created and picked up by an application node / executed on a worker thread
Still stalls a worker thread

Asynchronous requests via a MID server with no response handling

Benefits:

Does not cause any thread within the instance itself to stall (i.e. low impact to instance – all load shifted to MID servers)
Allows use of an endpoint which is not publicly accessible

Drawbacks:

Subject to lag between outbound ecc_queue record creation and being claimed by a MID server and again between MID server queuing and executing request
Stalls a thread on the MID server
No way in which response can be handled – purely for fire and forget messages – can be overcome via use of ‘setEccCorrelator()’ method (explained below)

Asynchronous requests via a MID server with response handling

Benefits:

Does not cause any thread within the instance itself to stall (i.e. low impact to instance – all load shifted to MID servers)
Allows use of an endpoint which is not publicly accessible
Allows response to be handled (all be it not by thread which initiated the request)

Drawbacks:

Subject to lag in various places, e.g., between outbound ecc_queue record creation and being claimed by a MID server, between MID server queuing and executing request, between MID server creating inbound ecc_queue record and scheduled job to process being claimed/executed via a worker thread
Stalls a thread on the MID server

Important: setEccTopic() vs setEccCorrelator()

If routed via a MID server, a request using setEccTopic() will fail with the corresponding output ecc_queue record being set to a state of 'error' and the text 'The MID Server code is unable to run this ECC Queue output topic' added by the MID server. The reason for this is that the MID checks that the topic field contains the name of a valid Discovery probe - in the case of an arbitrary topic this will not be the case hence the MID will reject the output record and will not process the request.

Requests which are routed via a MID should use the 'setEccParameter('skip_sensor', 'true')' and 'setEccCorrelator('[arbitrary string]')' methods when they are created. Both of these parameters have an important function:

Instead of modifying the 'topic' on the output ecc_queue record setEccCorrelator() will leave topic as 'RESTProbe' (which the MID recognises) but will add an arbitrary string to the 'agent_correlator' field. The MID will process this request as normal and write a corresponding payload back as an input ecc_queue record with the same value in the 'agent_correlator' field
By default there is an expectation that any input ecc_queue record written by a MID server will have a corresponding Discovery sensor to process that record within the instance. In the case of asynchronous REST requests this is not the case (as processing must be performed, for example, via a custom business rule on the ecc_queue table). Without setting 'skip_sensor = true' the asynchronous REST request will still function as expected however the input ecc_queue record will be set to a state of 'error' with the text 'No sensors defined'. Using 'skip_sensor = true' avoids this

To process the response to asynchronous REST requests routed via a MID server (and using setEccCorrelator()) an after insert business rule should be created against the ecc_queue table. This should:

Have a condition similar to the following such that it only processes input records with a specific value in the agent_correlator field:

current.agent_correlator == "[arbitrary string]" && current.topic == "RESTProbe" && current.queue == "input" && current.state == "ready"

Contain code in its script to set the state of the input record to processed:

current.state = "processed";
current.processed = new GlideDateTime().getDisplayValue();
current.update();

Note that it is OK to use current.update() in an 'insert' business rule as this will NOT cause the business rule to be recursive - clearly this type of behaviour should be avoided in an 'update' business rule as the business rule will trigger itself and so become cyclical.

The pitfalls of waiting for a response from asynchronous requests

NOTE: While the diagram below includes a MID Server, the guidance in this section applies to all use cases of SOAPMessageV2 and RESTMessageV2, regardless of if you use a MID Sever or not. TLDR; waiting for an asynchronous response makes it synchronous. If you don't want that, then use the setEccTopic() or setEccCorrelator() methods described above.

Generally, using asynchronous requests is a preferred approach and increases the likelihood of a consistently performing scalable instance. Be aware, however, that if any of the following methods are used after executeAsync, then you will be effectively making your request synchronous again:

waitForResponse()
getBody()
getStatusCode()
and so on

Triggering an asynchronous request, then calling ‘waitForResponse()’ is analogous to using a synchronous request in the first place. If it can be avoided, then it probably should be!

You see, when executeAysnc is used, a separate thread is spawned to handle the actual call to the 3rd party web service. This has the benefit of allowing the initiating thread to proceed without waiting for the response. However, if you then tell your code to wait for a response via getStatusCode, getBody, or waitForResponse, you loose that performance benefit. Your initiating thread will have to wait for completion of the asynchronous thread that it had spawned.

Be aware that if an asynchronous request is triggered, then one of the above methods is called (essentially making the request synchronous) an additional set of timeouts come into play dictating how long the thread will wait for a response (i.e. input record) to appear in the ECC queue (ecc_queue) table:

REST: glide.rest.outbound.ecc_response.timeout: Specified in milliseconds and defaults to 30 seconds
SOAP: glide.soap.outbound.ecc_response.timeout: Specified in milliseconds and defaults to 30 seconds

Both of the above can be overridden at run time via use of the 'waitForResponse([seconds])' method. Their aim is to avoid threads stalling for too long (or potentially indefinitely) waiting for some kind of response from an external web service.

Choosing an appropriate design

As this document describes, there are many ways in which to design outbound integrations and, whilst asynchronous requests are probably favorable, there is no perfect solution. All designs lead to stalled threads somewhere and there are various pros and cons around handling responses. Ultimately, the most important aspect to consider is where the instance can tolerate delays and the effects this will have at scale. For example, synchronous requests will stall 'front end' components of the instance (i.e. default/API_INT/worker threads). This may be OK in a test environment. But could it cause issues in an instance processing hundreds of thousands of transactions every day? Likewise, asynchronous requests using a MID server will generally stall threads on MID servers - this takes issues away from the instance itself. But how will MID servers cope and could this impact other MID functionality such as Discovery?

Hopefully the discussion and examples in this document will help promote 'balanced' designs where potential pitfalls are understood and can be worked around if necessary.

< Previous Article		Next Article >
Improving Slow OR and JOIN Queries		Performance Maintenance and Administration

J_rgen Blakstad · ‎05-31-2022

Hi!

Thanks for a great article!

If the business rule that handles "setEccCorrelator" is scoped, the following needs to be changed:

From:

current.processed = gs.nowDateTime();

To:

current.processed = new GlideDateTime().getDisplayValue();

Jørgen

Ken Adachi · ‎08-29-2022

Hi,

I noticed there is a discrepancy between this article with similar KB(https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0694711).

KB states as following:

ECC Timeout

The ECC timeout value is the amount of time that an asynchronous request will wait for the response to show up in the ECC Queue. This timeout applies to all asynchronous requests, both with a MID Server and without. However, this does not apply to synchronous requests because they do not use the ECC Queue.

On the other hand, this article states that "This timeout only applies to synchronous requests executed via a MID server".

GTSPerformance · ‎08-30-2022

@Ken Adachi

Wow, thanks for the great callout! I'll update both documents with the following corrected description:

ECC Timeout

ECC timeout is how long getStatusCode, getBody, or waitForResponse will wait for a response when the executeAsync() method has been used. Note that the ECC timeout will NOT impact the timeout of the HTTP web service request itself nor the timeout of the ecc response handling when setEccTopic() or setEccCorrelator() are being used!

ECC timeout is influenced by a number of factors:

1. The properties "glide.rest.outbound.ecc_response.timeout" and "glide.soap.outbound.ecc_response.timeout" respectively control the specific timeouts for REST and SOAP requests that use executeAsync (i.e., requests that are processed via ECC).
2. If the property "glide.http.outbound.max_timeout.enabled" is set to true (as it is by default), then a max ECC timeout will be enforced by the property "glide.http.outbound.max_timeout". This max ECC timeout is 30 seconds by default and cannot be set to greater than 30 seconds.
3. The above properties can be overwritten by the value passed in to waitForResponse(int timeToWaitMs) which receives its arguments in milliseconds. However, the timeout still cannot be greater than 30 seconds as long as "glide.http.outbound.max_timeout.enabled" is set to true.

Note: Waiting for a response after using executeAsync, as described above, is not ideal. It effectively makes the system synchronous again. The initiating thread must wait for the asynchronous thread to complete.

Best regards, Your Global Technical Support Performance team.

GTSPerformance · ‎08-30-2022

Thanks Jørgen,

We've updated the article.

Jodi Syverson · ‎07-21-2023

Hello. Are there any tips regarding the setup an integration Flow to use the Connection Timeout from an alias Connection instead of the global property glide.http.timeout? We seem to have some challenges with it "sticking".

ServiceNow Community servicenow community

Outbound Integrations Using SOAP / REST: Performance Best Practices

Background

Using a MID server

Synchronous vs asynchronous

Timeouts

Graphical examples

Synchronous requests directly from application nodes

Benefits:

Drawbacks:

Asynchronous requests directly from application nodes with no response handling

Benefits:

Drawbacks:

Asynchronous requests directly from application nodes with response handling

Benefits:

Drawbacks:

Asynchronous requests via a MID server with no response handling

Benefits:

Drawbacks:

Asynchronous requests via a MID server with response handling

Benefits:

Drawbacks:

Important: setEccTopic() vs setEccCorrelator()

The pitfalls of waiting for a response from asynchronous requests

Choosing an appropriate design

ECC Timeout

ECC Timeout