List Records without Pagination?

Graham_Morley · ‎Aug 23, 2016

Hi,

Is it possible (or on a future roadmap) to list all the records in a Table without Pagination?

I.e. Not have the ‘pageSize’ maximum limit of 100?

The reason behind the question is that I’m looking to see if there’s a way to pull all the data from a Table into Splunk, using its ‘REST API Modular Input’ (https://splunkbase.splunk.com/app/1546/).

I can get this working up to a point, but I have no real idea on how to deal with (or if this feature of Splunk can deal with) the Pagination / offset.

I’ll hold my hand up and say that I have very little knowledge of Python (which is what the tool appear to be written in), so writing a custom script / implementation is definitely beyond my ability.

It just seems slightly strange that you can download the whole Table / View to CSV, but with the API you’d have to do a whole sequence of REST calls with the changing ‘offset’ property.

Has anyone run into this as an issue, or any suggestions on how I could tackle it?

I’ve also posted the same question on the Splunk Answers forum to see if there are suggestions from the other perspective (https://answers.splunk.com/answers/442579/rest-api-modular-input-how-to-implement-a-custom-r.html)

Many thanks,

Graham.

Tuur · ‎Aug 23, 2016

Good point.

I’d like that too by the way.

Alexander_Sorok · ‎Aug 23, 2016

Hi Graham,

Splunk references “dynamic setting of URL arguments/HTTP header propertys ie: a stream cursor” in https://splunkbase.splunk.com/app/1546/#/details . That seems exactly what you’d need to do.

API and CSV work quite differently and generally the API should be very fast while CSV export may take some time.

Alex

Chester_McLaugh · ‎Aug 23, 2016

I’m not a Python developer, but looking at their code examples (specifically the Twitter handler) this seems like it might get close

https://raw.githubusercontent.com/damiendallimore/SplunkModularInputsPythonFramework/master/implemen...

class AirTableEventHandler:

    def __init__(self,**args):
        pass

    def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):
    if response_type == "json":
        output = json.loads(raw_response_output)
        for record in output["records"]:
            print_xml_stream(json.dumps(record))

        if not "params" in req_args:
            req_args["params"] = {}

        if output["offset"] is not None:
            req_args["params"]["offset"] = output["offset"]

        else:
            req_args["params"]["offset"] = None

    else:
        print_xml_stream(raw_response_output)

Graham_Morley · ‎Aug 24, 2016

Hi Chester,

Many thanks for this, I’ve tried it on my test system and it’s 99% working.

It’s now pulling all of the records from my Table, which is great.

The last piece of the puzzle is that the last ‘offset’ now gets stored as one of the parameters for when the Splunk input would run again (it runs on a polling interval).

This is slightly strange, as my understanding is that in the last iteration of pulling data from Airtable, ‘offset’ wouldn’t be in the response.

I wonder if there is a way to ‘default’ this parameter at the end of the process?

Thanks again though, that’s helped get me further along the path!

Graham.

Chester_McLaugh · ‎Aug 24, 2016

Ah! I just edited my previous post to add the following two lines which should “reset” the offset when it reaches the end. I’m not sure if that means that splunk will be re-importing the same records, or if it handles de-duplication on its own.

    else:
        req_args["params"]["offset"] = None

Graham_Morley · ‎Aug 26, 2016

Hi Chester,

Thanks again for taking a look at this. I gave it a go, but it still wasn’t behaving quite right.

But in the meantime, on the Splunk Answers Community site, I had a response and answer from Damien Dallimore, who wrote the Splunk REST API Modular Input.

So he needs to take all the credit for the one below:

class AirTableEventHandler:
 
     def __init__(self,**args):
         pass
 
     def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):
         if response_type == "json":
             output = json.loads(raw_response_output)
 
             #Print results from the first response
             for record in output["records"]:
                 print_xml_stream(json.dumps(record))
 
             offset = output["offset"]
 
             #Create the pagination loop
             while offset is not None:
 
                 #Construct and Get the next URL
                 next_url = response_object.url+'&offset='+offset
                 next_response = requests.get(next_url)
                 output = json.loads(next_response.text)
 
                 #Print out results from pagination looping
                 for record in output["records"]:
                     print_xml_stream(json.dumps(record))
 
                 #There is no "offset" value in the last JSON response.
                 #Need to check for its existence, otherwise get KeyError
                 #If doesn't exist, set to None and the while loop will exit
                 if "offset" in output:
                     offset = output["offset"]
                 else:
                     offset = None
 
         else:
             print_xml_stream(raw_response_output)

I’ve included it as it may help others who have a similar pagination query and know a bit about Python.