Consume HTTP APIs lazily


TL;DR: Using lazy sequences for API consumption has advantages in memory consumption and promotes interactive development.

HTTP APIs are ubiquitous and consuming them is a programming task coming up quite regularly (at least for me). Therefore I found the following idiom pretty useful, which turns a paginated, remote data source (in this case the GitHub jobs API) into a lazy sequence:

(defn- fetch-lazy-jobs-seq!
  ([]
   (fetch-lazy-jobs-seq! 0))
  ([page]
   (let [jobs-url     (fn [page](format "https://jobs.github.com/positions.json?page=%d" page))
         {body :body} (http/get (jobs-url page) {:as :json})]
     (if (seq body)
       (lazy-cat body (fetch-lazy-jobs-seq! (inc page)))
       body))))

In the REPL it is comfortable to just inspect parts of the data without waiting for many HTTP requests to be send:

(-> (fetch-lazy-jobs-seq!) first keys)

And in production interim results can be garbage collected as soon as they are processed (for example to look for all locations with Clojure jobs):

  (->> (fetch-lazy-jobs-seq!)
       (filter #(str/includes? (:description %)"Clojure"))
       (map :location)
       (set))

As this leads to many calls to the same host a connection manager helps with performance if necessary. You can find the example code on GitHub.


See also