Search

Data Purging – Using the Java API to gracefully purge old data

Most of us have come across a system that has forms that are so bogged down with millions of records of old data that it becomes a performance nightmare to work with.  In BMC Remedy ITSM systems, this typically ends up being audit forms, log forms, etc.  In situations like this, there are several options that can be explored such as:

  • Deleting chunks of data by hand
    • More difficult since the UserTool has gone the way of the dodo bird
    • Very slow since all entries get loaded into your browser which slows it to a crawl and chews up a tone of memory
    • Easy to experience timeouts if you try and be overzealous with your chunks
  • Writing workflow to delete data
    • Escalation & Filter combos are probably the most common approach taken by admins and consultants to get ride of data in a more controlled and automated fashion however this means customization which can sometimes be held up in red-tape or if not properly built can go wrong and cause harm.
    • The more forms you want to target with this approach, the more possible customization required to support it.
  • Sledge-Hammer approach (SQL Statements)
    • The absolute fastest way to remove data as it bypasses all the application layers however depending on the forms being targeted, a deep understanding of the relationships that may touch the table as well as a good general understanding of the AR System database schema is required
    • Truncate is amazing when you can use it (e.g. 100% dump) as it will work almost instantaneously
  • API Program with some smarts built in
    • Can control pretty much all aspects of what needs to be done and will get you to where you need to go without having to resort to manual delete, customization or SQL wizardry approaches.
    • Portable and extendable to your hearts (and programming brains) content.

Now let’s look at a solution to this problem using the API.  First let’s set some goals for our program:

  1. Let’s say you want to keep only the last N days of data in a particular set of forms to satisfy audit requirements
  2. We also need to be able to base N days upon either the Create Date (Field ID 3) or the Modified Date (Field ID 6).
  3. We need to do this during production hours (24×7 system) so we need to minimize the impact to the server

With the above in mind, let’s look at how we can tackle this.  As with all my Remedy Java API code, I have a yaml configuration parser that I use to feed in my connection details as well as any required parameters / inputs.  It is purely optional however if you want to use the subsequent class further down without change you will need this, otherwise you are free to modify the subsequent class to not require this.  (***Note, you can download a copy of all the source files at the bottom of this post).

Here is the code for the ParseConfig class:

/*
 * Copyright (c) 2014 Curtis Gallant <cgallant@soleauthority.net>
 *
 * Permission to use, copy, modify, and distribute this software for any
 * purpose with or without fee is hereby granted, provided that the above
 * copyright notice and this permission notice appear in all copies.
 *
 * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 */

package net.soleauthority.arsys.utils;

import org.yaml.snakeyaml.Yaml;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;

public class ParseConfig {
    private static String configServer = null;
    private static String configUser = null;
    private static String configPassword = null;
    private static int configPort = 0;
    private static int configRPC = 0;
    private static String configQueryType = null;
    private static int configDaysToKeep = 0;
    private static int configActionDelay = 0;
    private static int configSearchChunk = 0;
    private static ArrayList configListOfForms;

    public static String getConfigServer() {
        return configServer;
    }

    public static String getConfigUser() {
        return configUser;
    }

    public static String getConfigPassword() {
        return configPassword;
    }

    public static int getConfigPort() {
        return configPort;
    }

    public static int getConfigRPC() {
        return configRPC;
    }

    public static String getConfigQueryType() {
        return configQueryType;
    }

    public static int getConfigDaysToKeep() {
        return configDaysToKeep;
    }

    public static int getConfigActionDelay() {
        return configActionDelay;
    }

    public static int getConfigSearchChunk() {
        return configSearchChunk;
    }

    public static ArrayList getConfigListOfForms() {
        return configListOfForms;
    }

    protected static void readConfiguration(String configFileName) {
        try {
            InputStream input;
            if (configFileName.equals("")) {
                input = new FileInputStream(new File("PurgeDataConfig.yaml"));
            } else {
                input = new FileInputStream(new File(configFileName));
            }
            Yaml yaml = new Yaml();
            Map<String, ArrayList> config = (Map<String, ArrayList>) yaml.load(input);

            // Connection Info
            ArrayList connInfo = config.values().iterator().next();
            for (Object entry : connInfo) {
                // Server Name
                if (((HashMap) entry).get("ServerName") != null) {
                    configServer = ((HashMap) entry).get("ServerName").toString();
                }
                // Username
                if (((HashMap) entry).get("UserName") != null) {
                    configUser = ((HashMap) entry).get("UserName").toString();
                }
                // Password
                if (((HashMap) entry).get("Password") != null) {
                    configPassword = ((HashMap) entry).get("Password").toString();
                }
                // Port
                if (((HashMap) entry).get("Port") != null) {
                    configPort = Integer.parseInt(((HashMap) entry).get("Port").toString());
                }
                // RPC
                if (((HashMap) entry).get("RPC") != null) {
                    configRPC = Integer.parseInt(((HashMap) entry).get("RPC").toString());
                }
                // Type of Query, either Create Date or Modified Date
                if (((HashMap) entry).get("Query Type") != null) {
                    configQueryType = ((HashMap) entry).get("Query Type").toString();
                }
                // Days to keep
                if (((HashMap) entry).get("Days to keep") != null) {
                    configDaysToKeep = Integer.parseInt(((HashMap) entry).get("Days to keep").toString());
                }
                // Search Chunks
                if (((HashMap) entry).get("Search Chunk") != null) {
                    configSearchChunk = Integer.parseInt(((HashMap) entry).get("Search Chunk").toString());
                }
                // Time between delete actions
                if (((HashMap) entry).get("Action Delay") != null) {
                    configActionDelay = Integer.parseInt(((HashMap) entry).get("Action Delay").toString());
                }
            }

            // Forms list
            configListOfForms = config.get("Forms to purge");

            // Close our config file
            input.close();

        } catch (FileNotFoundException e) {
            System.out.println("Can't find " + configFileName);
            System.exit(1);
        } catch (IOException e) {
            System.out.println(e.getMessage());
            System.exit(1);
        }
    }
}

This will give us a yaml configuration file reader that will provide all our connection info, options for setting the query type, number of days of data to keep, artificial lag between delete operations to reduce impact to production servers (and help avoid timeouts), search chunk capability as well as create an ArrayList of all the forms we want to include in our purging. Below is a sample PurgeDataConfig.yaml file that will be read by the above that I used for testing, it contains some default values and a form I loaded with 500k records of varying data, change these to whatever form(s) you wish to purge:

# Place your connection information in the block below target AR Server
Connection Info:
 - ServerName: <ServerName>
   UserName: <UserName>
   Password: <Password>
   Port: <Port>
   RPC: <RPC>

   # Set the query to be either based on Create Date (Field ID 3) or Modified Date (Field ID 6)
   # Possible values are: Create Date or Modified Date
   Query Type: Create Date

   # Set the number of days to keep in the target forms, should be a number like 30
   Days to keep: 30

   # Enter an optional amount of milliseconds delay between deletes to be less impacting to server, e.g. 10
   Action Delay: 0

   # Set value for Search Chunk if you wish to limit searches in size, can be useful if no Max GetList is defined on the server
   # e.g. 5000 to break searches in chunks of no more then 5000 to not time out server when we generate the list of records to delete
   # this setting is ignored if Max Get List is defined unless the value is smaller then the Max Get List value.  Default value is 0.
   Search Chunk: 0

# Place the list of forms you wish to purge in the block below, format is "-"
# If your form has dash in the name, encapsulate it in double-quotes
Forms to purge:
 - "CG:TESTPURGE"

Now that we can parse our configuration file, here is the real Class that does all the work of purging data.  Here is the high level flow of the program based on my findings (not including boilerplate stuff):

  1. Grab the list of forms defined in the config file
  2. Process each form one at a time by creating a list of EntryListInfo objects that contain nothing but the EntryIDs of all the records that should be deleted based upon the number of days and type of query (based on Create Date or Modified Data)
  3. Once we have our list of records to delete, delete one by one and optionally introduce lag (in milliseconds) between each delete operation (this delay comes from the configuration file).

And now here is the code for our PurgeData class, reminder that you can download the code and a fully packed executable jar file if you want a copy of the code and the program without having to compile yourself.  You’ll notice I threw some additional goodies in there like a progress bar and per form timers, both are purely optional however it’s nice to give visual feedback while it’s running, especially if run against millions of records where it could be running for hours on end (or if you have put in a lot of delay between delete operations).

/*
 * Copyright (c) 2014 Curtis Gallant <cgallant@soleauthority.net>
 *
 * Permission to use, copy, modify, and distribute this software for any
 * purpose with or without fee is hereby granted, provided that the above
 * copyright notice and this permission notice appear in all copies.
 *
 * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 */

package net.soleauthority.arsys.utils;

/**
 * PurgeData - This program will purge data from target forms based upon how many days you want to keep
 */

import com.bmc.arsys.api.ARException;
import com.bmc.arsys.api.ARServerUser;
import com.bmc.arsys.api.EntryListFieldInfo;
import com.bmc.arsys.api.EntryListInfo;
import com.bmc.arsys.api.OutputInteger;
import com.bmc.arsys.api.QualifierInfo;
import com.bmc.arsys.api.ServerInfoMap;

import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.concurrent.TimeUnit;

public class PurgeData {
    private static ARServerUser ctx;

    private PurgeData(String configFileName) {
        ParseConfig.readConfiguration(configFileName);
        ctx = new ARServerUser();
        ctx.setServer(ParseConfig.getConfigServer());
        ctx.setUser(ParseConfig.getConfigUser());
        ctx.setPassword(ParseConfig.getConfigPassword());
        ctx.setPort(ParseConfig.getConfigPort());
        if (ParseConfig.getConfigRPC() != 0) {
            try {
                ctx.usePrivateRpcQueue(ParseConfig.getConfigRPC());
            } catch (ARException e) {
                System.out.println("Invalid RPC port specified for " + ParseConfig.getConfigServer() + ": " + ParseConfig.getConfigRPC());
            }
        }
    }

    public static void main(String[] args) {
        long startTime;
        PurgeData ars;
        if (args.length > 0) {
            ars = new PurgeData(args[0]);
        } else {
            ars = new PurgeData("");
        }
        ars.connectionTest();
        for (Object schema : ParseConfig.getConfigListOfForms()) {
            startTime = timerStart();
            processEntries(schema.toString(), ParseConfig.getConfigDaysToKeep(), ParseConfig.getConfigQueryType());
            timerStop(startTime);
        }
        ctx.logout();
    }

    private static void processEntries(String schemaName, int daysToKeep, String queryType) {
        List<EntryListInfo> entryLists;
        String qualString = null;
        QualifierInfo qual;
        if (queryType.equals("Create Date")) {
            qualString = "'3' >= ($TIMESTAMP$ - (60*60*24*" + daysToKeep + "))";
        } else if (queryType.equals("Modified Date")) {
            qualString = "'6' >= ($TIMESTAMP$ - (60*60*24*" + daysToKeep + "))";
        }
        try {
            qual = ctx.parseQualification(schemaName, qualString);
            int maxGetList = getMaxGetListForServer();
            if (ParseConfig.getConfigSearchChunk() > 0 && ParseConfig.getConfigSearchChunk() < maxGetList) {
                entryLists = getEntryListInfos(schemaName, qual, ParseConfig.getConfigSearchChunk());
            } else {
                entryLists = getEntryListInfos(schemaName, qual, maxGetList);
            }
            System.out.println("\nDeleting " + entryLists.size() + " total entries from schema: " + schemaName);
            int counter = 0;
            for (EntryListInfo entryList : entryLists) {
                ctx.deleteEntry(schemaName, entryList.getEntryID(), 0);
                Thread.sleep(ParseConfig.getConfigActionDelay());
                counter++;
                printProgressBar((counter * 100) / entryLists.size());
            }
            System.out.println("\n");
        } catch (ARException e) {
            System.out.println("Can't get list of entries for deleting: " + e.getMessage());
            System.exit(1);
        } catch (InterruptedException e) {
            System.out.println(e.getMessage());
        }
    }

    private static int getMaxGetListForServer() throws ARException {
        ServerInfoMap maxGetListConfiguration = ctx.getServerInfo(new int[]{28}); // 28 = MaxGetList
        return Integer.parseInt(maxGetListConfiguration.firstEntry().getValue().toString());
    }

    private static List<EntryListInfo> getEntryListInfos(String schemaName, QualifierInfo qual, int maxResults) throws ARException {
        OutputInteger totalCount = new OutputInteger();
        List<EntryListInfo> entryLists;
        List<EntryListFieldInfo> returnFields = new ArrayList<EntryListFieldInfo>();
        EntryListFieldInfo returnField = new EntryListFieldInfo(1);
        returnFields.add(returnField);
        entryLists = ctx.getListEntry(schemaName, qual, 0, maxResults, null, returnFields, true, totalCount);
        // If there are more results then maxResults will allow to fetch, iterate over until we have our fully populated List
        if (entryLists.size() < totalCount.intValue()) {
            for (int i = entryLists.size(); i <= totalCount.intValue(); i += maxResults) {
                entryLists.addAll(ctx.getListEntry(schemaName, qual, i, maxResults, null, returnFields, true, null));
            }
        }
        return entryLists;
    }

    private static void printProgressBar(int percent) {
        StringBuilder bar = new StringBuilder("[");
        for (int i = 0; i < 50; i++) {
            if (i < (percent / 2)) {
                bar.append("=");
            } else if (i == (percent / 2)) {
                bar.append(">");
            } else {
                bar.append(" ");
            }
        }
        bar.append("]   ").append(percent).append("%     ");
        System.out.print("\r" + bar.toString());
    }

    private static long timerStart() {
        return new Date().getTime();
    }

    private static void timerStop(long lStartTime) {
        long lEndTime = new Date().getTime();
        long difference = lEndTime - lStartTime;
        System.out.println("Elapsed time: " + String.format("%d min, %d sec",
                TimeUnit.MILLISECONDS.toMinutes(difference),
                TimeUnit.MILLISECONDS.toSeconds(difference) - TimeUnit.MINUTES.toSeconds(TimeUnit.MILLISECONDS.toMinutes(difference))));
    }

    private void connectionTest() {
        System.out.println();
        try {
            ctx.verifyUser();
        } catch (ARException e) {
            System.out.println("Could not log into server: " + e);
            System.exit(1);
        }
    }
}

There you have it, a sample POC for purging large amounts of data from a Remedy system. This utility is meant to be a proof of concept and while it works well for me, it has not been heavily tested so use with caution. If you have a very fast system, there really is no need to introduce lag with the Action Delay or chunks so just keep both at zero. If your Remedy system on occasion or on large operations typically times out, I’d recommend starting with say 10 (milliseconds) in the Action Delay option and 2500 for the Search Chunk and see how it goes. If timeouts continue to occur (during the delete actions), you can slow creep up the Action Delay in 5ms increments, if it times out in gathering the list of entries to delete, you can lower the Chunk Size until you find the sweet spot for your server.

It should be noted that you can run this program over and over again or schedule it to run on a periodic basis if you wanted it to.  Since the premise of this sample program is to keep data that is N days old, you can run it as many times as you need to get to your desired number of entries and each iterative time it will get faster and faster as the raw counts drop to the point that running it on a daily basis may run in a matter of seconds instead of big hours long runs deleting millions of records each time as they pile up over months/years again. Functionality could be added to make this more dynamic like being able to write your own qualification in the configuration file for maximum flexibility however to keep it simple and to satisfy our stated requirements I’ve only let the number of days be the quantitative value in the control of the user.

It should also be noted again that this is a sample program and doesn’t go all out in proper exception handling so keep that in mind and treat it as a simple utility, not a bulletproof solution.

I hope this can be useful to somebody and as always happy coding 🙂

{wpdm_category=code}

Leave a Reply

Your email address will not be published. Required fields are marked *