Help

Re: REST API Modular Input: How to implement a custom response handler to deal with pagination limit

1823 0
cancel
Showing results for 
Search instead for 
Did you mean: 
James_Mason1
4 - Data Explorer
4 - Data Explorer

Hello All,

I’ve been trying to pull together a proof of concept using Splunk and a product called Airtable

They have a RESTful API that allows records to be accessed, which is great for use with the REST API Modular Input.

The API structure is:

https://api.airtable.com/v0/[Base]/[Table]

I’m 90% there, but I’m struggling with a ‘pagination’ problem.

The Airtable REST API allows you to pull all of the records in a table, but limits results to a maximum of 100. If there are more results than this, then the response includes an ‘offset’ field in the JSON response. You then need to issue the request again, with this as a parameter.

I know very little about JSON and even less about Python and Splunk Tool, so I don’t have the first clue about how to implement a Custom Response Handler to deal with this.

So far, thanks to previous questions and answers, I’ve got this far with my Response Handler:

class AirTableEventHandler:
 
     def __init__(self,**args):
         pass
 
     def __call__(self, response_object,raw_response_output,response_type,req_args,endpoint):
         if response_type == "json":
             output = json.loads(raw_response_output)
             for record in output["records"]:
                 print_xml_stream(json.dumps(record))
         else:
             print_xml_stream(raw_response_output)

This works for breaking up the results into Events, but doesn’t begin to deal with the pagination question.

I’ve truncated the output to 4 records, but this is the response that you’d actually get back when there are more than 100 records:

{"records":[{"id":"rec4rlLs4jDeqhFRR","fields":{"Outlet":"20-001","Location":["rec5lh6NgFH33y6ax"],"Rack":["recUOCOlAycENRdo5"],"Type":"Wall Socket","Line checked":true,"Found":true,"On-Drawing":true,"Location_Static":"AP.1.004","Rack_Static":"Rack 20","Location Name":["Physiotherapy Area (Male)"]},"createdTime":"2016-08-22T08:27:05.000Z"},{"id":"recrFjR0pyu2RIjp6","fields":{"Outlet":"20-002","Location":["rec5lh6NgFH33y6ax"],"Rack":["recUOCOlAycENRdo5"],"Type":"Wall Socket","Line checked":true,"Found":true,"On-Drawing":true,"Location_Static":"AP.1.004","Rack_Static":"Rack 20","Location Name":["Physiotherapy Area (Male)"]},"createdTime":"2016-08-22T08:27:05.000Z"},{"id":"recQxL1aNe7wdpM4g","fields":{"Outlet":"20-003","Location":["rec5lh6NgFH33y6ax"],"Rack":["recUOCOlAycENRdo5"],"Type":"High Socket","Line checked":true,"Found":true,"On-Drawing":true,"Location_Static":"AP.1.004","Rack_Static":"Rack 20","Location Name":["Physiotherapy Area (Male)"]},"createdTime":"2016-08-22T08:27:05.000Z"},{"id":"recLK0w2w1HTF1Jwu","fields":{"Outlet":"20-004","Location":["rec5lh6NgFH33y6ax"],"Rack":["recUOCOlAycENRdo5"],"Type":"High Socket","Found":true,"On-Drawing":true,"Location_Static":"AP.1.004","Rack_Static":"Rack 20","Location Name":["Physiotherapy Area (Male)"]},"createdTime":"2016-08-22T08:27:05.000Z"}],"offset":"itrATjKH4HwrwifHj/rectLCyqqrnhD0jr9"}

The crucial bit here being the:

 "offset":"itrBTjKH4HwrwifHj/rectLCyqqrnhD0jr9"
This changes with each 100 record response.

So to get the next 100 results, I’d need to do something like:

 https://api.airtable.com/v0/[Base]/[Table]?offset=itrBTjKH4HwrwifHj/rectLCyqqrnhD0jr9
(And then the rest of the "URL Arguments" as defined in the Modular Input)

Would anyone be able to suggest a Custom Response Handler that could iterate over this offset key, to capture all of the results?

In general, Airtable looks to be a very flexible DB solution for some SMB use-cases, so would be great if I could get this working with Splunk.

Thanks in advance,

1 Reply 1
Jamesmillere
4 - Data Explorer
4 - Data Explorer

Hi,

As per experience, it seems very complex yes I will try my best to solve it with example.

To implement pagination in your custom response handler for the Airtable REST API, you can modify the handler to make multiple requests with different offset values until all records have been retrieved.

Here's an example of how you can modify the existing response handler to implement pagination:

Python

class AirTableEventHandler:

def __init__(self, **args):
self.offset = None

def __call__(self, response_object, raw_response_output, response_type, req_args, endpoint):
if response_type == "json":
output = json.loads(raw_response_output)
for record in output["records"]:
print_xml_stream(json.dumps(record))

# Check if there are more records to retrieve
if "offset" in output:
self.offset = output["offset"]
# Make another request with the new offset value
self.get_next_batch()

else:
print_xml_stream(raw_response_output)

def get_next_batch(self):
# Get the original request URL
url = self.offset.split("?")[0]

# Get the original request URL arguments
query = dict(parse_qsl(urlsplit(self.offset).query))

# Add the offset argument to the query
query["offset"] = self.offset

# Make another request with the new query parameters
response = requests.get(url, params=query, headers={"Authorization": "Bearer [YOUR API KEY]"})
self.__call__(response, response.text, "json", {}, "")

This modified handler keeps track of the current offset value and makes another request with the new offset value if there are more records to retrieve. It does this by calling the get_next_batch() method, which constructs a new URL with the updated offset value and makes a new request using the requests library. The Authorization header is added to the request to authenticate with the Airtable API using your API key.

To use this modified handler, you can instantiate it in your Splunk configuration file and use it with the REST API Modular Input. For example:

[my_airtable_data_input://my_airtable_query]
# Set the handler for the input
handler = my_module.AirTableEventHandler

# Set the URL and API key for the Airtable REST API

Replace [Base], [Table], and [YOUR API KEY] with the appropriate values for your Airtable account.

With this modified handler, this Splunk input will retrieve all records from the Airtable REST API, regardless of the number of records.


url = https://api.airtable.com/v0/[Base]/[Table]
api_key = [YOUR API KEY]

# Set any other query parameters for the Airtable API
# For example, to limit the number of records to retrieve:
maxRecords = 100

I hope this will help you.

Thanks