9 Lessons From a Year of Apollo Federation

7 min readFeb 17, 2021

Kyle Schrade
Engineer at StockX
@NotKyleSchrade

At StockX, our engineering team had to move quickly to build out the systems required to support our rapidly expanding business. When you’re in that mindset of build build build, you can forget to take the time to look at your architecture holistically. For each new use-case, we had to support, our engineering team built custom RESTful API endpoints with particular functionality. After a year, we started to realize the amount of boilerplate required every time we went to add a new endpoint, and we wondered if there was a better way to scale out our API.

We decided to use GraphQL.

Migrating a massive number of RESTful API endpoints to GraphQL, as well as building our GraphQL expertise from the ground up, was no simple feat. I’d like to highlight some of the lessons we wish we had known when we were getting started.

1. Build your schema as you go, don’t try to do it all at once

Since we had an existing API layer, our initial approach for GraphQL schema design was to mirror the shape of the existing RESTful APIs. We figured this best way to get buy-in for a move to GraphQL.

This strategy worked well for the proof-of-concept, but once we started building it out for production, we realized that we had set ourselves up for pain. Our RESTful APIs had a ton of duplication, and by designing our GraphQL schema based on those APIs, we ended up with a ton of duplication in our schema. A year later, we are still in the process of removing those unused fields.

We now follow a practice we call Just In Time Schema (JITS) development. It’s essentially the principle YAGNI (you ain’t gonna need it), but applied to schema design; instead of adding fields and types upfront, we wait to expose new things until a consumer truly needs them.

2. GraphQL forces domain conversations

Our Just In Time Schema development practice led us to start conversations to better understand the division of responsibilities and the relationships between services and types. We discussed what fields belong in which type, what to do with our massive existing types, and how to break them into smaller, better-namespaced types. We are working towards a place where our schema is intuitive and easy to use.

It’s imperative to think about how to handle types when using Apollo Federation, but more on that later.

3. Build your documentation with your schema

Our RESTful APIs had an enormous lack of documentation and relied heavily on tribal knowledge to consume from them. It was tough to understand the use-cases we supported, which led to duplication between endpoints and challenges maintaining backward compatibility.

One thing that has been a massive boon for our GraphQL adoption is the built-in documentation for our GraphQL schema through the use of documentation strings. GraphQL pushes us to document each type and field as we build out our schema, and if that wasn’t enough, Apollo has a built-in schema explorer: Apollo Studio. Our consumers no longer have to guess what fields are required in a request or what fields are guaranteed in a response.

By building our documentation as we build out our schema, we avoid playing the catch-up game with our documentation.

4. The performance improvement is real

At StockX, GraphQL brought massive performance increases. We’re talking SECONDS of improvement — yes, you read that right. Seconds!

The most critical performance improvement we enjoyed was the reduction of payload size. With GraphQL, payload sizes were up to 7 times smaller! When looking at the performance metrics in our mobile app, we saw significantly lower memory usage after switching to GraphQL.

As you can imagine, migrating many disparate endpoints to be served by one GraphQL server means the GraphQL server becomes a choke point for requests. Before GraphQL, we used various caching strategies since our RESTful endpoints did their own caching. The shift to GraphQL allowed us to standardize our caching approach and increase the overall design. With a standardized caching strategy, we reduced the number of requests to our backend services.

Stay tuned for my other post titled “Caching Approaches in GraphQL Federation” to learn how to cache in a federated GraphQL architecture using memoization, data loader, and distributed caching techniques.

5. Faster development cycles

Since migrating to GraphQL, we’ve noticed a significant improvement in our development cycles. When interacting with a RESTful API, your mentality is all around the “actions” you are taking against that API; GET this item, POST that item, etc. When interacting with a GraphQL API, you shift your mentality to just think about the data.

You’ve probably experienced a situation where a front-end team is designing a new page, and they need something slightly different than an existing endpoint provides. The back-end team is reluctant to add a brand-new endpoint just this one use-case, so they push back and ask the front-end team to work with the existing endpoints. After a bit of back and forth, one of the teams folds and implements something they aren’t super happy with. This process takes so much more time than you realize, and it ends with one of the teams doing something they aren’t thrilled about.

GraphQL reduces that tight coupling between a front-end and back-end and allows teams to work independently. The front-end team can craft a query for exactly the data they need, and the back-end team can add new types and fields to the existing schema.

6. Don’t reinvent the wheel

When we started our GraphQL implementation, we started with a single GraphQL server. This path was acceptable at first, but as the number of teams implementing GraphQL types for their use-cases increased, so did the amount of coordination needed to manage PRs and deployments. When we finally exposed our implementation to production traffic, we encountered challenges in scaling our sizeable, monolithic API. We quickly realized we needed a strategy for breaking apart our handlers without causing consuming from our API to be a nightmare.

We started to use Apollo Federation to implement a microservice-like architecture that allows us to implement a single GraphQL schema across multiple services. This allowed each team to independently manage their specific part of the GraphQL schema, scale traffic more effectively, and ship code faster.

7. GraphQL federation scales horizontally

At StockX, our marketing team will plan events that bring a massive influx of traffic to our site in a short period — for example, sending out a push notification to all our customers causes our traffic to surge as everyone opens the app.

As you can imagine, smaller, more focused services have a faster start time and consume a smaller amount of resources to run. By splitting up our GraphQL handlers using Apollo Federation, we could scale our resolvers to handle these spikes independently and effectively. Our GraphQL implementation uses NodeJS, and since NodeJS is single-threaded, horizontal scaling is the best way to handle more traffic. We still have to preemptively scale our systems before large traffic spikes, but with this model, we can scale only the implementing services affected by the event.

For example, if we send out an email to clients about a fantastic blog post, we can scale the blog portion of our federated architecture and leave everything else the same.

5. There’s a learning curve

RESTful APIs have become the default standard for creating web apps. Most tutorials out there for getting started with a back-end framework have you build a simple RESTful endpoint, and testing that endpoint can be done from your browser. For RESTful API consumers, getting familiar with an API usually involves making sample requests and looking at the responses’ different data structures. The learning curve here has a fast start and levels off after a while.

With GraphQL, the learning curve for GraphQL resembles a hockey stick. Since consumers have to specify the fields they want in the response, it requires them to have some idea of what they want before making the request. Sure the GraphQL studio can be a huge help when exploring an API, but we’ve noticed that people tend to miss the existence of specific fields when interacting with a new type.

Once you wrap your head around GraphQL, you’ll notice a mindset change from the RESTful world; and it’s a world where you can move much faster.

9. Read. Then, keep reading.

We have had some stumbles while adopting GraphQL. The team at StockX has had no prior experience with GraphQL.

So we read, and then read, then read even more. The amount of knowledge out there has helped us make more informed decisions. Some of the articles we found helpful were:

Conclusion

GraphQL has enabled us to have faster development cycles, push smaller changes to production with fewer problems, rethink our UX, and overall make StockX an even better place. It was a ton of work to migrate to GraphQL, but looking back over the last year, I’m confident that the migration was a major success and had an incredibly positive impact on the way we work.

If you’re interested in changing the game, please apply to join our team.