Demystifying table rotation, extension, and table cleaner

GTSPerformance · ‎11-05-2020

< Previous Article		Next Article >
Maintaining Task table Growth		Optimizing the Service Portal Typeahead search widget

Overview

This guide is written by the ServiceNow Technical Support Performance team (All Articles). We are a global group of experts that help our customers with performance issues. If you have questions about the content of this article we will try to answer them here. However, if you have urgent questions or specific issues, please see the list of resources on our profile page: ServiceNowPerformanceGTS

One of the areas of the ServiceNow platform which doesn't appear to be particularly well understood (or documented) is table rotation. This article will attempt to demystify this concept as well as discussing its relationship to table cleaner. In some cases this might help improve the performance/scalability of new ServiceNow implementations and, even if this doesn't apply directly to the reader, its always good to understand how things work under the covers.

Firstly lets explore the problem that table rotation attempts to address:

Certain parts of the ServiceNow platform (think of logging as an example) tend to create huge data sets
All log messages are all written to a single table (syslog)
Its reasonable to assume that users might actually want to query this data to troubleshoot issues

This, however, creates a potential issue - relational databases don't tend to perform well with huge data sets:

A SQL query executed against a table containing millions (or even billions) or rows is likely to be slow
A good database indexing strategy can help with this but only to a point - even with really good indexes data sets eventually grow so large that query performance still becomes frustratingly slow

Its reasonable to think that a simple solution to this is just to delete old data to keep tables to a manageable size - here too, however, there are pitfalls:

Certain data (think of email as an example) simply cannot be deleted - many customers have legal requirements meaning that this data has to be stored for multiple years
Relational databases simply aren't that good at deleting rows of data - depending on how this is accomplished it can cause:
- Database tables to be locked for periods of time (an instant outage in the case of certain tables)
- Rows simply to be marked as deleted (and not physically removed from underlying storage). Over time this causes fragmentation of tables impacting query performance and a slow but inexorable increase in database size

How does table rotation help to deal with these issues:

Table rotation is configured via records in the sys_table_rotation table (accessible via 'System Definition -> Table Rotations'). In this table there will be a row for every table in the platform against which table rotation has been configured. One important field is 'Type' - this can be set to 'Rotation' or 'Extension' - for now lets consider what 'Rotation' means and come back to 'Extension' a little later.

An example of an out of box table against which rotation is configured is syslog - if you open this record you will see something like the following:

Lets expand on what this means:

The platform has a logical table named syslog which, via the user interface, you can interact with as you would expect
This logical table, however, has been configured as type 'rotation' with 8 'rotations' and a duration of 7 days
This means that instead of there being a single syslog table in the underlying database there are in fact 9 tables (the base table plus 8 'rotations') - note that within ServiceNow we often refer to these rotations as 'shards' or 'subtables'

Details of all the physical table used in a rotation are held in the 'sys_table_rotation_schedule' table - lets look at this more closely again using the example of 'syslog':

The first thing to note is that the base table (syslog) generally isn't used as part of the rotation so normally wouldn't be expected to hold any data, i.e.:

It is used, however, as a 'template' for how each of the shards are structured. To explain further - lets decide that a change needs to be made to the syslog table (i.e. increasing the length of a field). An administrator visits the user interface, makes the required change, and takes a well earned break expecting that their work is complete. Note, however, that it isn't:

The structure of the base table (syslog) will have been modified
The structure of all of the shards (syslog000[x]) will still be as they were
Any data written to the shards will therefore not be able to take advantage of the changes to the base table which might have unexpected consequences!

To enforce the change to the base table across all of the shards the administrator has to also visit the corresponding 'Table Rotation' record and click the 'Synchronize shards' UI action which then propagates out the changes.

The next thing to note is that each shard has 'valid_from' and 'valid_to' dates and times. These dictate when the platform will physically write data to each of the tables. For example:

If the current date is Thursday 5th November 2020 then, right now, the platform will be recording log messages in shard syslog0003 (as the current date/time is covered by its valid_from/valid_to fields)
Lets say we want to look at log messages from Monday 12th October 2020 - again using valid_from/valid_to its clear that these will exist in shard syslog0007

Finally each shard has a field called 'offline' which is set to true or false - this dictates whether data from that table can be queried via the user interface. In the above example records from shard syslog0005 will not be able to be viewed via the user interface as that shard is considered offline (note that if its absolutely necessary to retrieve data from an offline shard support may be able to do this via the back end database).

What all of the above means is that when table rotation is configured:

A number of shards will be created (depending on the 'rotations' defined in the sys_table_rotation record)
Each shard will be given a valid_from and valid_to date/time
The platform inserts data into a specific shard depending on the current date/time and the valid_from/valid_to of each of the shards
As a result data is logically distributed across shards depending on the date/time each individual row was written by the platform

The final piece of the puzzle is what happens when the rotation runs out of shards - i.e. using the above example above the last valid_to date/time is 18th November - what happens on the 19th? In summary when the platform moves between shards two things happen to the oldest (and offline) shard:

All of its data removed such that it is empty
Its valid_from/valid_to dates will be set to the next week in the future

By doing this table rotation always stays one shard ahead of where its currently writing meaning that switching to the next shard (when that date/time arises) is seamless.

Now we know how rotations physically work lets talk about potential benefits:

Query performance: Table rotations use of shards means that data will be logically distributed across a number of physical tables - this can give significant benefits

Lets consider that our syslog table contains 80m records (a whole lot of data) - a single query running across all of these records might therefore be expected to be slow.
Each shard, however, will likely contain around 10m records (a much smaller set of data) meaning that query performance is potentially much improved assuming SQL queries only target a single shard

Data deletion: As touched on previously deletion of individual rows of data has various pitfalls. Table rotation, however, only ever deletes a whole shard (or tables) worth of data at once

It doesn't need to do row by row deletion and instead can just truncate the corresponding shard
This is a far better approach and avoids a number of the concerns associated with row based deletion

As with everything, however, there are some trade offs - the biggest is SQL queries which have to traverse multiple shards. This behaviour is fully supported - the platform will simply UNION results from each shard however this does have an implied performance hit. For example if a list of all records from the syslog table is loaded via the user interface a query such as the following will be seen - ouch!

Queries running against a table using rotation should always define a relatively short window of time based on sys_created_on (as this is the field governing rotation) - this is known as 'time boxing' and ensures that the query will only hit a small number of shards (hopefully only a single shard) giving much improved query performance:

How table extensions differ:

Lets move on to table extensions. At the top of this article it was mentioned that customers might need to retain certain data sets for many years or even indefinitely - at a glance table rotation doesn't help with this as this strategy is designed to constantly recycle tables used for short lived/transient data. This is where extensions come in.

Fundamentally extensions are extremely similar to rotations:

They are defined in exactly the same place as table rotations (the sys_table_rotation table) - in this case, however, the rotation type is set to 'Extension'
They are based around a base table and a number of shards
They are given a duration which determines the period of time across which each shard is written to

Note, however, that the shards in a table extension do not rotate - instead of the oldest shard being truncated and re-used an extension will simply create new shards indefinitely such that data remains logically separated across shards due to creation date and no data is ever deleted. The audit (sys_audit) table is an extension in many customer instances which means that, over time, the logical audit table is backed by a whole number of physical shards, i.e.:

NOTE: The sys_audit table is not extended out-of-the-box and, in general, it is not recommended by ServiceNow to enable extension on the sys_audit table. Having the sys_audit table on extension can increase the time it takes to build a "History Set" (see KB0791687).

As with table rotations, any newly created shard will use the base table as its template. In addition, it's still necessary to execute SQL queries which are time boxed - imagine what a SQL query across all of the above instance's sys_audit shards would look/perform like!

The only other point to consider with table extensions is that no data will ever be deleted from any of the shards in an extension - every extension you configure will grow indefinitely/for ever! Also deletion of data from an extension can be tricky (covered further below) - if you should consider configuring an extension make sure that:

Business requirements really justify an extension (i.e. data in the extension must be kept indefinitely)
The only data written to the extension is data which really must be kept (as otherwise the size of your instance will continually grow at an elevated rate)

Regardless of the approach which is used table rotations and extensions can be an extremely powerful tool in handling large data sets.

Choosing between rotation or extension:

Rotation or extension?
- If the data being written is transient/only needs to be kept for a relatively short period of time use a rotation
- Extensions should only ever be used for data which must never be deleted
Length of duration?
- Determine the average rate at which data will be written to the rotation/extension and the largest data set which is likely to give reasonable SQL query performance
- Lets say your application writes 1m records per day an a reasonable data set across which to query is 10m records
- Duration should therefore be set to 10 days (or lower) to maintain good SQL query performance
Number of rotations?
- Obviously this only applies to table rotations and really depends on how long data needs to be kept/queried.
- Remember that one shard will always be empty (as its the next shard to be written to) and another shard will always be offline (cannot be queried via the user interface).
- As an example lets say shard duration is 10 days and data must be available to be queried for 50 days - in this case 7 rotations would be required (5 shards which can be queried, 1 offline shard, and 1 truncated shard)

How table cleaner fits into the puzzle:

Lets start with some important notes:

Table cleaner is not supported with tables using extension / rotation - table cleaner rules can be configured against such tables but they will be silently skipped
There is no benefit in configuring table cleaner against a rotated table anyway as, due to the way in which rotation works, any data stored in a rotated table is by definition transient
There might be scenarios where old data needs to be cleaned up from an extended table however this is not possible with table cleaner. In this scenario some kind of other approach (such as using a script) will be required. Note that even if data is removed from an extended table (or any other table) space on disk will not be reclaimed. Instead free space will be created within the table for use by future changes to data (i.e. inserts). In the case of an old extension shard there will never be any new inserts so this free space is useless. To physically reclaim space on disk the table must be rebuilt so a case should be opened with ServiceNow support after data has been deleted.
Table cleaner deletes with 'setWorkflow(false)' meaning that business rules, workflows, and flows which might be expected to trigger on record deletion will not in the context of table cleaner. This is important if you have business logic which depends on this type of functionality

Lets talk more about how table cleaner operates:

Table cleaner is a scheduled job which runs, by default, once per hour - its purpose is to delete older records from tables in the instance thereby keeping table sizes (and query performance) manageable
Tables against which to run/records to delete are governed by table cleaner rules (defined in the sys_auto_flush table) - each rule specifies:
- The target table (i.e. which table to delete from)
- A 'matchfield' - this must be a date/time field which can be compared to the current date/time
- Age in seconds - the delta between the value in a records 'matchfield' and the current date/time above which the record should become a candidate for deletion
- Conditions - a filter which allows further definition of exactly which records should be deleted (for example 'active = false AND state=closed')
- Cascade delete - this determines whether records which reference the table / records being deleted are also considered for deletion (depending on their cascade delete setting). By default this is set to false so cascade deletes will not take place. Enabling cascade deletes may avoid dangling references being left in other tables however may negatively impact the throughput of table cleaner deletions

An out of box example of a table cleaner rule against the sp_log table is shown below - this will delete any records where sys_created_on is more than 90 days old (note that no condition is provided so there is no further filtering of records to remove):

Each time the table cleaner job starts it iterates over active table cleaner rules deleting records from corresponding tables. Note that for each table it will generally:

Get a count of all records in the table
Get a count of records which are candidates for deletion
Start to delete the records (this is generally done in 'batches' of sys_ids for optimal performance)

This strategy does have a number of potential pitfalls, e.g.:

If the table is extremely large, obtaining the count of all records may take an excessive amount of time. Note, however, that in recent releases the SQL query to get this record count has improved (as it now estimates total rows in the table so this is less of a concern than it used to be)
If the table does not have a supporting index on the matchfield column, obtaining a count of records which are candidates for deletion or obtaining sys_ids for deletion can take an excessive amount of time

This means that table cleaner does not perform well against extremely large tables (i.e. its maybe not a good candidate to use against a large sys_audit table) and it is an absolute requirement to have a supporting database index on the physical column used as matchfield. If these items are not met, table cleaner will still run, however:

It might trigger extremely slow/expensive database queries which can have the potential to impact underlying database server CPU
It may spend a huge amount of time determining/deleting records - by default table cleaner will only spend 20 minutes maximum deleting from a single table so if queries are slow the volume of records deleted in 20 minutes may be small

That being said table cleaner is an extremely good strategy for regularly deleting relatively small numbers of records from a table to keep overall table size under control.

One final point is that, by default, table cleaner will track the volume of data removed from a table. If this is greater than 50% of all rows then table cleaner will trigger a compaction of the table. In short, this reorganises the physical storage of table/index data and removes fragmentation again helping to keep performance of SQL queries against the table as expected. If, for some reason, compaction is not required the sys_dictionary collection record from the table can have the 'no_optimize=true' attribute added. Note that with compaction disabled table cleaner will skip getting a total count of records in the table - in some cases (where the table is extremely large) this might help avoid an expensive/slow SQL query but it does mean that table cleaner will never compact the table.

Note that table cleaner is relatively configurable - there are various system properties controlling aspects such as the percentage of records which have to be removed before triggering table compaction, how long it will run against a single table and so on. All combinations are too lengthy to list here - generally the out of box defaults work well however if there is a need to tune support should be your first port of call.

A closing comment:

Hopefully this article helps to at least explain the purpose of table rotation, extension, and table cleaner. Unfortunately these are complex topics so, as a final note, if changes are being made in these areas always test them first in a sub production instance and if there is any doubt call support first. Support engineers are generally much happier discussing the best way to achieve a good outcome than dealing with the aftermath of unintended data loss!

< Previous Article		Next Article >
Maintaining Task table Growth		Optimizing the Service Portal Typeahead search widget

GTSPerformance · ‎01-08-2021

Here's a tiny URL if you want to link this post on social media: https://tinyurl.com/yyxa2n4g

Yunus Simsek · ‎03-15-2021

Great and very well-written explanation. Thank you for sharing this!

Waadullah · ‎07-28-2021

Extremely Helpful, Very well explained as well.

Johnny Li · ‎09-28-2021

Awesome article!

Just one question, is sys_audit table setup as table extension Out-of-Box? I have checked a few Quebec instances but don't see sys_audit table listed in Table Rotation module.

GTSPerformance · ‎09-29-2021

Hi Johnny,

You are right, sys_audit is not on Table Extension out-of-the-box. I'll update this document to reflect that explicitly.

A lot of older, larger customers do have it on extension. I believe in the past, under some conditions it has been considered a performance gain, however, at this point (Sep 29, 2021), the consensus from ServiceNow development is that having Table Extension on the sys_audit table is generally not desirable.

Thanks for the comment! I noticed that some of our product documentation also needs to be updated to the reflect the current state and I've submitted documentation feedback requests to get those up-to-date.

Best Regards,

Your Global Technical Support Performance team

Johnny Li · ‎10-13-2021

Thanks for that.

abc48 · ‎10-18-2021

Great article!

Just two questions.

1- If we create sys_auto_flush records for five tables and capture it in update set. When we commit the update set into higher environment and when the TableCleaner will run for first time after those records are inserted, how will it identify which table record to delete first ? And Will it delete all records for that table at once ?

And also

Suppose If we have any table( Table A - here incident_task) which is dependent on another table( Table B - here incident) as shown below.

And the Table cleaner picks up 'Table A - incident_task' first to delete the data, how will this work since 'Table B - Incident' records are not yet deleted and hence the condition will not be matched.

2 - What will happen in this case ?

GTSPerformance · ‎10-18-2021

Hello @abc ,

To answer your questions:

1.) How will [TableCleaner] identify which table record to delete first? And Will it delete all records for that table at once?

A. To answer the first part of your question, the order of execution of sys_auto_flush records by the TableCleaner job is arbitrary. You can't control which table gets cleaned first. There is no way to control it because GlideRecord query that selects all active sys_auto_flush records to execute is hard-coded in Java and does not have any explicit sort order. Without any instructions about how to sort results, a database will always return results in the same arbitrary order. That arbitrary order might change, however, if you move to a different database.

To answer the second part of your first question, no, Table Cleaner will not necessarily delete all records for that table at once. The default behavior for the Table Cleaner job is to clean in batches limited by execution time per sys_auto_flush entry. It will start cleaning the oldest records first, 800 records at a time, until it has either cleaned all matching records or run for more than 20 minutes. If Table Cleaner has tried to clean a single table for more than 20 minutes then, after it finishes a batch, it will stop cleaning that table and move to the next table. The next time Table Cleaner runs again, it will resume deletion from that table where it left off, with the oldest record that matches the sys_auto_flush conditions. There are some exceptions to this rule. For example, if the sys_dictionary.attributes field of the collection record of a table has the attribute iterativeDelete=true, then the 20 minutes execution time limit will not be enforced. Using iterative delete is very inefficient, but some out-of-box tables need it to run in order to support running business rules or script engines during Table Cleaner.

2.) When there are dependencies between two tables, Table Cleaner should rely on the behavior of cascade_delete field in System Dictionary. See Cascade Delete Rules in our product documentation. If you want to bypass that behavior, there is a boolean field on the sys_auto_flush table called "cascade_delete". If you set that to false, then Cascade Delete Rules for dependent records will not trigger during Table Cleaner.

Regards, the ServiceNow Performance Support team

Jerome Couzy · ‎09-19-2022

Very good article, thanks for sharing.

What would be the tool we would use to cleanup very large tables since Table cleaner wouldn't be appropriate?

Thanks

Niclas · ‎11-21-2022

@GTSPerformance : I have a question regarding Cascade Delete on Table Cleaner:

Let's assume I have

Table A, for which I set up sys_auto_flush entry
Table B, with Reference field to Table A, Cascade Rule set to Delete
Table C, also with Reference field to Table A, Cascade Rule NOT set to delete

In a previous comment you stated: "If you want to bypass that behavior, there is a boolean field on the sys_auto_flush table called "cascade_delete". If you set that to false, then Cascade Delete Rules for dependent records will not trigger during Table Cleaner."

This indicates if I uncheck cascade_delete on sys_auto_flush, neither records in B nor C should be cascade deleted. When I check it, only records in table B should be deleted.

This sound valid for me, but the tooltip for sys_auto_flush cascade_delete field is indicating something different: "Cascade the table cleanup to any records that reference the records being cleaned. "

Based on that wording, it would mean that if i check cascade_delete on sys_auto_flush also records in Table C may be deleted - as technically, they refer to recording being cleaned in table A. So would that be the case or is it simply bad wording of the Tooltip?

sergiofeito · ‎11-24-2022

hello!
Very good explanation, thank you very much for sharing it.

I have a question, I work in an instance with a huge volume of data and we are always looking at which tables we can reduce in size.

The syslog and sys_transaction tables are rotated tables that in our case are close to 2 Tbytes, both have the box setting of 8 rotations and 7 days.

We would like to change the rotations from 8 to 4, for two reasons, the first is that we could reduce the size by half, which is very good for our instance, and on the other hand, 4 rotations (4 weeks) seems sufficient for troubleshooting the majority of the incidents.

my questions are:
Does this change have any impact on the instance? or on the contrary it can be done at any time
By reducing the number of rotations, does the system eliminate the remaining shards? we know that space recovery does not occur until the table is dropped, this happens automatically

Thanks a lot.

GTSPerformance · ‎11-28-2022

Hello sergiofeito,

Thanks for your comment! We're glad this article has been helpful. To answer, your question, reducing the number of table rotations is not supported. We recommend changing the duration of the table rotations instead. Reducing duration is supported and has no impact to implement. The space will be recovered at the time when each rotation gets truncated - i.e. when it is the oldest rotation.

Regards, your ServiceNow Performance Team

GTSPerformance · ‎11-28-2022

@Niclas

Thanks for your comment! You asked about what happens when cascade_delete on sys_auto_flush is not checked. I believe your understanding of our article is the following:

Assuming the following scenario:

Table A, for which I set up sys_auto_flush entry
Table B, with Reference field to Table A, Cascade Rule set to Delete
Table C, also with Reference field to Table A, Cascade Rule NOT set to delete

Expected:

1. If I uncheck cascade_delete on sys_auto_flush, neither records in B nor C should be cascade deleted.

2. If I check it, only records in table B should be deleted.

Your understanding of our article is the correct understanding. Sorry that the wording of the Tooltip has caused confusion. I can see how the Tooltip seems to imply that all reference field relationships will change from their configured cascade delete behavior (clear/delete/restrict/none) to follow only delete behavior. I believe the intended meaning of the Tooltip is that when the cascade_delete field on sys_auto_flush is checked, the entire Cascade Delete behavior (clear/delete/restrict/none) - which includes not deleting in the case of table C - follows the standard cascade cleanup behavior. That is, in fact, how the product works.

Hope this is helpful, thanks again for your comment!

Sincerely, ServiceNow Global Technical Support Performance Team

sergiofeito · ‎11-28-2022

Hi Kilo,

great!!

It is another way to achieve the same result, it is perfect for us.

I have a case opened with this, I´ll update the case with your comments.

thanks again.

regards,

Michael-PHL · ‎02-04-2023

I have a question - We have an older instance - been around since maybe 2009 - and we are in fact extending the sys_audit table, and we are up to sys_audit0124 - which is A LOT of data, and when looking at our database footprint, there are all around 50GB (lately from what I gather). We do NOT need to have data going back to 2009, but maybe the last 5 years perhaps.

Real question is - does this actually affect performance of the instance with these large audit tables?

Jacques Clement · ‎02-14-2023

This is great content, thank you!

One question on the compaction of the table (when > 50% of the data has been deleted): does this have a big impact on the overall performance and do we have control over when this happens? Thanks.

GTSPerformance · ‎02-16-2023

@Jacques Clement

Yes, compacting a table can be very impactful - especially for large, frequently accessed tables. If more than 50% of a table is deleted in a single run of Table Cleaner, then compaction is triggered. Under the hood, the database executes the OPTIMIZE operation. This can cause locks at the database layer, freezing related transactions or jobs until it completes. You can stop it from happening completely by adding the 'no_optimize=true' attribute to the sys_dictionary collection record for the table that you do not want to have compacted. You can also change the default threshold for when compaction should be triggered with the property, glide.db.optimize.threshold. It defaults to 50. Raising the threshold would require a greater percentage of the table to be deleted before compaction will kick in.

[Edit Feb 27, 2023: In most scenarios ServiceNow no longer uses the MySQL OPTIMIZE operation to perform compaction. We now use a proprietary backend process we call chunk copy to keep the table online/lock-free while we rebuild it. In some cases chunk copy can create replication lag, but it does have protective logic to throttle updates if replication lag occurs, allowing replication to catch up. In summary, table compaction is now much less impactful. I believe chunk copy has been the default setting since the Orlando release.]

GTSPerformance · ‎02-16-2023

@Michael-PHL

You mentioned that you have the sys_audit table on table extension and asked if there is any performance impact. I don't believe there will necessarily be significant impact. Unless you are experiencing extremely slow performance from queries related to the sys_audit table I wouldn't worry about it.

Under normal circumstances, the only place where the audit table gets queried is when building a History Set. History Sets are built when, for example, a user with a certain language/timezone/domain combination tries to view the Activity Stream of an Incident. A History Set is a transient record of information about a target record including journal entries, changes to audited fields and related emails. The process of building a History Set can become very slow when there are a large number of audit entries associated with a single record. I imagine that this might be somewhat exacerbated if, for example, there was a record that has been in your system for many years and has accumulated many thousands of sys_audit records spread over many different table extension "shards". The resulting UNION query would probably be slower than a similar query against a monolithic table. The data access pattern for the sys_audit table is to query by the "documentkey" field that stores the sys_id of the target document. This makes the data retrieval itself quite fast because it executes as a direct equivalency search against an indexed field. Even supposing you must UNION together dozens of sys_audit "shards", I imagine the database could serve that query very efficiently since each shard would be treated as an individual result set/execution plan and each could follow the same strategy of searching against the documentkey field.

Sincerely, ServiceNow Global Technical Support Performance Team

Tushar Walveka2 · ‎09-17-2023

This is exactly what I've been looking for! Thank you for sharing such a comprehensive and informative resource on table roatation & extentions.

RichP-InComm · ‎10-13-2023

@GTSPerformance : I have a follow-on question to the information you provided Michael-PHL. Specifically, we too are an older customer who has table extension set for sys_audit. We have data going back 10+ years in some of the extended tables and we just don't need it. In fact, we were asked by our account rep to buy more storage and these tables are the root cause.

My question is what would be the best method to clear out these table? I have used table cleaner for other large tables, but your article indicates that won't work on these tables. Do I use the drop from TableUtils() for this? Any other suggestions?

Fantastic article by the way!

Thanks,

Rich

GTSPerformance · ‎10-16-2023

@RichP-InComm, thanks for your question and complimenting the article! Currently we are not aware of a best practice approach to cleaning the sys_audit table. It is not cleaned out-of-box.

As mentioned in the above article and as you noted, using Table Cleaner for sys_audit is risky. For one thing, Table Cleaner runs a COUNT(*) operation against the target data set to be cleaned at the start of each execution. That query can be quite slow against a large table like sys_audit. It is critical that the right indexes are in place before such a Table Cleaner could be activated. Also, there would probably need to be some kind of manual cleanup prior to activating the Table Cleaner since you would want to avoid the case where the first of the Table Cleaner it has 3M records to clean or something - that would make the COUNT(*) operation slow.

I have theorized in the past that shifting sys_audit from a "Extension" to "Rotation" model might be a good solution for cleanup, provided that you were okay with actually deleting all audit records older than a certain time. This would have the advantage of immediately reclaiming disk space when the oldest Rotation "shard" gets truncated. I don't know if this is even possible. It should be done with the guidance of ServiceNow.

Another thing to watch out for is that if the audit records are removed for a Journal field entry and the target record (e.g. Incident) has not been deleted, then the Activity Stream will be missing data for the given Incident, since Activity Stream is backed by the sys_audit table. Usually customers have requirements to only remove audit data for fields that are no longer being audited or for which the target record has already been deleted/archived. For example, I helped a customer recently who did a cleanup where we deleted all sys_audit, sys_audit_relation and sys_audit_delete records for fields on the task_sla table that were being excessively audited (please don't turn on auditing for your entire task_sla table!). We used a custom script that worked its way through the task_sla table and then queried the sys_audit table for the relevant audit records using the documentkey field (this is an indexed field on the sys_audit table and so very quick to retrieve records). We deleted the sys_audit records in batches of 500 by their sys_id. Deleting by sys_id is what the Table Cleaner does - it avoids what is known as "gap lock". Remember, once you do the deletion (whether it is by manual script or by Table Cleaner) you won't get the actual disk space back until the table is compacted. There is a feature in Washington I believe that will make table compaction a more automatic process, but I haven't seen it in action yet.

One thing that a colleague suggested, that we have yet to test, is to use Archive and Destroy Rules on sys_audit - similar to how it is used for sys_email. I believe you would need to switch your sys_audit table back to a single physical table instead of having it on Table Extension. That would require work from ServiceNow support to move the data back to the base table from the backend.