Quick tutorial on using GraphQL with Python.
Having spent some time earlier this year experimenting with gRPC for defining and integrating server/client pairs, this weekend I wanted to spend a bit of time doing a similar experiment with GraphQL.
I couldn’t find any particularly complete tutorials for doing this in Python, so I’ve written up what I hope is a useful collection of notes for someone looking to try out GraphQL in Python.
Full tutorial code is on Github.
Goal
At Digg, we had a simple service which would crawl a given URL and return its title, a summary and any worthy images. Early Digg relied heavily on unreliable scraping heuristics to extract these characteristics, but most websites these days have enough social media metadata to greatly simplify the process.
The project we’re building is to recreate that crawling service, and we’ll build it using the extraction library I wrote some years back (which was on my mind because I recently updated it to be Python3 compatible).
When we’re done, we’ll submit client requests like this:
{ website(url: "https://lethain.com/migrations") { title image } }
The server’s response will be:
{ "data": { "website": { "title": "Migrations: the sole scalable fix to tech debt.", "image": "https://lethain.com/static/blog/2018/migrations-hero.png", } } }
Each website will also include a description
field.
Setup
With the assumption that you have Python3 locally available, let’s first create a virtual environment for our dependencies, and then install them:
mkdir tutorial cd tutorial python3 -m venv env . ./env/bin/activate pip install extraction graphene flask-graphql requests
If you want to exact versions used in this tutorial, you can find them in the requirements.txt on Github.
Crawl & Extract
Before jumping into the GraphQL, let’s quickly write the code for crawling and extracting data from a website, since that’s a bit of a side show.
Using extraction and requests, this looks like:
import graphene import extraction import requests def extract(url): html = requests.get(url).text extracted = extraction.Extractor().extract(html, source_url=url) return extracted
Which we’d use as follows:
>>> extract('https://lethain.com/migrations') <Extracted: (title: 'Migrations: the sole scalable fix to tech debt.', 4 more), (url: 'https://lethain.com/migrations/', 1 more), (image: 'https://lethain.com/static/blog/2018/migrations-he', 1 more), (description: 'Migrations are both essential and frustratingly', 5 more), (feed: 'https://lethain.com/feeds/')>
Each Extracted
object mamkes five pieces of data available: title, url, image, description and feed.
Full code in extraction_tutorial/schema.py
Schema
At the base of every GraphQL API is a GraphQL schema, which describes the objects, fields and types for the exposed API. We use Graphene to describe our schema as a Python object.
Writing a schema to describe an extracted website is fairly straight forward, for example:
import graphene class Website(graphene.ObjectType): url = graphene.String(required=True) title = graphene.String() description = graphene.String() image = graphene.String()
Here we’re only using graphene.String
to describe our fields’ types
but each field could be another object we’ve described or
a number of other enums, scalars, lists and such.
What’s a bit unexpected is that we also have to write a schema that describes the query we’ll make to retrieve these objects:
import graphene class Query(graphene.ObjectType): website = graphene.Field(Website, url=graphene.String()) def resolve_website(self, info, url): extracted = extract(url) return Website(url=url, title=extracted.title, description=extracted.description, image=extracted.image)
In this case website
is an object type that we support querying against,
url
is a parameter that’ll be passed along to the resolution function,
and then resolve_website
is called by each request to a website
object.
Note that there is a fair amount of magic happening here, with the names
having to match exactly for this to work. Most of my issues writing this
code were typos across fields, causing them not to match properly.
Also note that extract
is the function we wrote in the previous section.
The final step is to create a graphene.Schema
instance which you’ll pass
to your server to describe the new API you’ve created:
schema = graphene.Schema(query=Query)
With that done, you’ve created
Full code in extraction_tutorial/schema.py
Server
Now that we’re written our schema, we can start serving it over HTTP using flask and flask-graphql:
from flask import Flask from flask_graphql import GraphQLView from extraction_tutorial.schema import schema app = Flask(__name__) app.add_url_rule( '/', view_func=GraphQLView.as_view('graphql', schema=schema, graphiql=True) ) app.run()
Note that unless you’ve downloaded the example code, your schema will have a different import path. It’s also fine to put your schema and the server into a single file if you don’t want to mess with import paths.
Now you can run your server via
python server.py
After which it’ll start running, available at localhost:5000
.
Full code in extraction_tutorial/server.py
Client
Although they exist, you don’t need a special GraphQL client to perform API requests against your new API, you can stick to the http clients that you’re used to, with us using requests in this example.
import requests q = """ { website(url: "https://lethain.com/migrations") { title image description } } """ resp = requests.post("http://localhost:5000/", params={'query': q}) print(resp.text)
Running that script, the output would be:
{ "data": { "website": { "title": "Migrations: the sole scalable fix to tech debt.", "image":"https://lethain.com/static/blog/2018/migrations-hero.png", "description":"Migrations are both essential and frustratingly..." } } }
You can customize the contents of q
to retrieve different fields, or even use
things like aliases to retrieve multiple objects at once.
Full code in extraction_tutorial/http_client.py
Extending objects
Potentially the most interesting and exciting part of GraphQL is how
easy it is to extend your object without causing compatibility issues
in your client. For example, let’s imagine we wanted to start returning
pages’ RSS feed as well through a new feed
field.
We can add it to Website
and update our resolve_website
method
to return the feed
field as follows:
import graphene class Website(graphene.ObjectType): url = graphene.String(required=True) title = graphene.String() description = graphene.String() image = graphene.String() feed = graphene.String() class Query(graphene.ObjectType): website = graphene.Field(Website, url=graphene.String()) def resolve_website(self, info, url): extracted = extract(url) return Website(url=url, title=extracted.title, description=extracted.description, image=extracted.image, feed=extracted.feed)
If you wanted to retrieve this new field, you’d just update
your query to also request it, in addition to the other
fields like title
and image
that you’re already retrieving.
Introspection
One of the most powerful aspects of GraphQL is that its servers support introspection, which allow both humans and automated tools to understand the available objects and operations.
The best example of this is that if you’re running the example
we built, you can navigate to localhost:5000
and use
GraphiQL to directly test
your new API.
These capabilities aren’t restricted to GraphiQL, you can also integrate with them using the same query interface you’d use to query your new API. A simple example would be we can ask about the available queries exposed by our sample service:
{ __type(name: "Query") { fields { name args { name } } } }
To which the server would reply:
{ "data": { "__type": { "fields": [ { "name": "website", "args": [{ "name": "url" }] } ] } } }
There are a bunch of other introspection queries available, which are a bit clumsy to write, but expose a tremendous amount of power to tool builders. Definitely worth playing with!
Closing thoughts
Overall, I was quite impressed with how easy it was to work with GraphQL, and even more impressed with how easy it was to integrate against. This approach to describing objects was more intuitive to me than gRPC’s, with the later still being more akin to writing a protocol than describing an object.
At this point, if I was writing a product API, GraphQL would be the first tool I’d reach for, and if I was writing a piece of infrastructure, I’d still prefer gRPC, especially for its authentication and tight HTTP/2 integration (for e.g. bi-direcitonal streaming).
Lots of additional questions to dig into here at some point:
- How do they fair in terms of data compression?
- Does compression even really matter if the servers are compressing the results?
- Does GraphQL have worse protocol compression but superior field compression since folks have to explicitly ask for what they need?
- How well do their field deprecation stories work in practicve? Both have some story around deprecation, neither seeming ideal, with GraphQL’s deprecation warnings seeming a bit superior, since you could imagine writing your client libraries to surface all deprecation warnings returned by the API with a log of some sort.
I’m sure there are a bunch more as well!