Irrational Exuberance!

Quick tutorial on using GraphQL with Python.

November 18, 2018. Filed under graphqlpython

Having spent some time earlier this year experimenting with gRPC for defining and integrating server/client pairs, this weekend I wanted to spend a bit of time doing a similar experiment with GraphQL.

I couldn't find any particularly complete tutorials for doing this in Python, so I've written up what I hope is a useful collection of notes for someone looking to try out GraphQL in Python.


Full tutorial code is on Github.

Goal

At Digg, we had a simple service which would crawl a given URL and return its title, a summary and any worthy images. Early Digg relied heavily on unreliable scraping heuristics to extract these characteristics, but most websites these days have enough social media metadata to greatly simplify the process.

The project we're building is to recreate that crawling service, and we'll build it using the extraction library I wrote some years back (which was on my mind because I recently updated it to be Python3 compatible).

When we're done, we'll submit client requests like this:

{
  website(url: "https://lethain.com/migrations") {
      title
      image
  }
}

The server's response will be:

{
  "data": {
    "website": {
      "title": "Migrations: the sole scalable fix to tech debt.",
      "image": "https://lethain.com/static/blog/2018/migrations-hero.png",
    }
  }
}

Each website will also include a description field.

Setup

With the assumption that you have Python3 locally available, let's first create a virtual environment for our dependencies, and then install them:

mkdir tutorial
cd tutorial
python3 -m venv env
. ./env/bin/activate
pip install extraction graphene flask-graphql requests

If you want to exact versions used in this tutorial, you can find them in the requirements.txt on Github.

Crawl & Extract

Before jumping into the GraphQL, let's quickly write the code for crawling and extracting data from a website, since that's a bit of a side show.

Using extraction and requests, this looks like:

import graphene
import extraction
import requests

def extract(url):
    html = requests.get(url).text
    extracted = extraction.Extractor().extract(html, source_url=url)
    return extracted

Which we'd use as follows:

>>> extract('https://lethain.com/migrations')
<Extracted:
  (title: 'Migrations: the sole scalable fix to tech debt.', 4 more),
  (url: 'https://lethain.com/migrations/', 1 more),
  (image: 'https://lethain.com/static/blog/2018/migrations-he', 1 more),
  (description: 'Migrations are both essential and frustratingly', 5 more),
  (feed: 'https://lethain.com/feeds/')>

Each Extracted object mamkes five pieces of data available: title, url, image, description and feed.


Full code in extraction_tutorial/schema.py

Schema

At the base of every GraphQL API is a GraphQL schema, which describes the objects, fields and types for the exposed API. We use Graphene to describe our schema as a Python object.

Writing a schema to describe an extracted website is fairly straight forward, for example:

import graphene

class Website(graphene.ObjectType):
    url = graphene.String(required=True)
    title = graphene.String()
    description = graphene.String()
    image = graphene.String()

Here we're only using graphene.String to describe our fields' types but each field could be another object we've described or a number of other enums, scalars, lists and such.

What's a bit unexpected is that we also have to write a schema that describes the query we'll make to retrieve these objects:

import graphene

class Query(graphene.ObjectType):
    website = graphene.Field(Website, url=graphene.String())

    def resolve_website(self, info, url):
        extracted = extract(url)
        return Website(url=url,
                       title=extracted.title,
                       description=extracted.description,
                       image=extracted.image)

In this case website is an object type that we support querying against, url is a parameter that'll be passed along to the resolution function, and then resolve_website is called by each request to a website object.

Note that there is a fair amount of magic happening here, with the names having to match exactly for this to work. Most of my issues writing this code were typos across fields, causing them not to match properly. Also note that extract is the function we wrote in the previous section.

The final step is to create a graphene.Schema instance which you'll pass to your server to describe the new API you've created:

schema = graphene.Schema(query=Query)

With that done, you've created


Full code in extraction_tutorial/schema.py

Server

Now that we're written our schema, we can start serving it over HTTP using flask and flask-graphql:

from flask import Flask
from flask_graphql import GraphQLView
from extraction_tutorial.schema import schema

app = Flask(__name__)
app.add_url_rule(
  '/',
  view_func=GraphQLView.as_view('graphql', schema=schema, graphiql=True)
)
app.run()

Note that unless you've downloaded the example code, your schema will have a different import path. It's also fine to put your schema and the server into a single file if you don't want to mess with import paths.

Now you can run your server via

python server.py

After which it'll start running, available at localhost:5000.


Full code in extraction_tutorial/server.py

Client

Although they exist, you don't need a special GraphQL client to perform API requests against your new API, you can stick to the http clients that you're used to, with us using requests in this example.

import requests

q = """
{
  website(url: "https://lethain.com/migrations") {
    title
    image
    description
  }
}
"""

resp = requests.post("http://localhost:5000/", params={'query': q})
print(resp.text)

Running that script, the output would be:

{
  "data": {
    "website": {
      "title": "Migrations: the sole scalable fix to tech debt.",
      "image":"https://lethain.com/static/blog/2018/migrations-hero.png",
      "description":"Migrations are both essential and frustratingly..."
    }
  }
}

You can customize the contents of q to retrieve different fields, or even use things like aliases to retrieve multiple objects at once.


Full code in extraction_tutorial/http_client.py

Extending objects

Potentially the most interesting and exciting part of GraphQL is how easy it is to extend your object without causing compatibility issues in your client. For example, let's imagine we wanted to start returning pages' RSS feed as well through a new feed field.

We can add it to Website and update our resolve_website method to return the feed field as follows:

import graphene

class Website(graphene.ObjectType):
    url = graphene.String(required=True)
    title = graphene.String()
    description = graphene.String()
    image = graphene.String()
    feed = graphene.String()    

class Query(graphene.ObjectType):
    website = graphene.Field(Website, url=graphene.String())

    def resolve_website(self, info, url):
        extracted = extract(url)
        return Website(url=url,
                       title=extracted.title,
                       description=extracted.description,
                       image=extracted.image,
                       feed=extracted.feed)

If you wanted to retrieve this new field, you'd just update your query to also request it, in addition to the other fields like title and image that you're already retrieving.

Introspection

One of the most powerful aspects of GraphQL is that its servers support introspection, which allow both humans and automated tools to understand the available objects and operations.

The best example of this is that if you're running the example we built, you can navigate to localhost:5000 and use GraphiQL to directly test your new API.

These capabilities aren't restricted to GraphiQL, you can also integrate with them using the same query interface you'd use to query your new API. A simple example would be we can ask about the available queries exposed by our sample service:

{
  __type(name: "Query") {
    fields {
      name
      args {
        name
      }
    }
  }
}

To which the server would reply:

{
  "data": {
    "__type": {
      "fields": [
        {
          "name": "website",
          "args": [{ "name": "url" }]
        }
      ]
    }
  }
}

There are a bunch of other introspection queries available, which are a bit clumsy to write, but expose a tremendous amount of power to tool builders. Definitely worth playing with!

Closing thoughts

Overall, I was quite impressed with how easy it was to work with GraphQL, and even more impressed with how easy it was to integrate against. This approach to describing objects was more intuitive to me than gRPC's, with the later still being more akin to writing a protocol than describing an object.

At this point, if I was writing a product API, GraphQL would be the first tool I'd reach for, and if I was writing a piece of infrastructure, I'd still prefer gRPC, especially for its authentication and tight HTTP/2 integration (for e.g. bi-direcitonal streaming).


Lots of additional questions to dig into here at some point:

  • How do they fair in terms of data compression?
  • Does compression even really matter if the servers are compressing the results?
  • Does GraphQL have worse protocol compression but superior field compression since folks have to explicitly ask for what they need?
  • How well do their field deprecation stories work in practicve? Both have some story around deprecation, neither seeming ideal, with GraphQL's deprecation warnings seeming a bit superior, since you could imagine writing your client libraries to surface all deprecation warnings returned by the API with a log of some sort.

I'm sure there are a bunch more as well!