Building Bookshop: Stocking over 10M Books with Ingram

Robb Chen-Ware
HappyFunCorp Codex
Published in
4 min readJul 11, 2020

--

HFC worked with Bookshop.org and book distributor Ingram to launch a new destination for online book shopping.

Huge stack of books

Bookshop has been an ambitious vision from the start, aiming to take market share from Amazon in the online book-selling world and give the profits back to independent bookstores around the country.

After a long road of partnerships, design, engineering, and unexpected circumstances (e.g. COVID-19), Bookshop has been more successful in a short time than anyone involved would have believed possible, scaling in just a few months from launch to 2M+ visitors in June — and millions more in monthly revenue, with much of it going to support indie bookstores. To date, nearly $5M has been raised for said bookstores (there’s a ticker on the site that transparently tracks this).

In the planning stages of Bookshop, it was clear that the largest technical task ahead was our integration with fulfillment partner Ingram Content Group (Ingram). Ingram provides key elements of the shopping experience:

  • Product information w/ related data & assets for millions of book titles
  • Order creation and fulfillment

When we began, Ingram was already actively servicing large eCommerce providers, so we knew the platform would work for our needs. However, getting the pieces into place was a significant task.

Product Data

In order to fulfill Bookshop’s mission, it was important that we stocked as many books as possible in the early days. We wanted to avoid the case where a curious person would find out about Bookshop, search for a book they wanted and not find it, possibly never to return. Fortunately, Ingram carries millions of books — we just needed to get them into our system.

Ingram supplies product data primarily as fixed-width files downloadable from their FTP servers. Spread across several dozen files, it was critical for us to identify data relevant for our use, as the largest files can be up to 15GB+ uncompressed. Another factor was understanding the ongoing sync schedule, as files are interdependent and have different refresh cycles. (Ingram does offer an API for product data, however for our purposes we only use that for on-the-fly refreshing between data syncs.) The destination for this data is a SQL database attached to our Rails-based eCommerce framework, Solidus, a fork of SpreeCommerce.

Ingram import process for Bookshop on GCP
Outline of the Ingram import process for Bookshop. The entire Bookshop application is hosted on Google Cloud.

To process the large data files, we eventually settled on a tiered approach that could be parallelized. The ingestion service is split up into two pieces: A bigger instance first ingests and splits up the data into smaller chunks, then passes the chunks to the smaller instances for processing as Sidekiq jobs. These post records into Solidus via its API.

Given the interdependence of data, we need to process the files in a specific sequence — the categories and product families first before titles, then images, and so on.

Though Solidus did help us to quickly set up a basic frontend storefront, admin, and checkout flow, the sheer scale of our data SKU count required us to replace many components. As an example, we lean on Elasticsearch to display book lists, as doing it via SQL would be very taxing, especially with multiple joins, sort criteria, etc. The current result is a reasonably fast-loading site with a very large product database.

Order Creation and Fulfillment

Orders placed on Bookshop are filled by Ingram as well. Not only did Ingram have the breadth of books that a business like Bookshop would need, they also have the capacity for shipping books out rapidly enough that many orders could be received in just a couple of days.

Our integration between Solidus and Ingram for ordering was by necessity custom, as was the product ingestion.

Due to the nature of varying stock levels and backorder statuses for books, customer service actions and the like, a few dozen cases had to be covered in order to prepare our ordering logic for production. Most relevantly, books could be backordered and eventually filled or eventually canceled, sometimes with gaps of several weeks between actions.

Orders are placed over API to Ingram’s system, which returns a simple initial response. Additional order details and status updates are made available over FTP (flat files) which we routinely parse and ingest. Though Bookshop has had 10k+ order days, it was not necessary to build as extensive a data ingestion pipeline as needed for product information. The primary challenge here was handling the myriad scenarios Ingram fulfillment can have.

Conclusion

Our integration with Ingram represented the single largest piece of engineering effort in building the initial version of Bookshop. The sheer scale of the SKU count and size of associated data made for an engineering challenge beyond what’s typical for a new eCommerce business, but we’re proud of the work we did on this. And, without Ingram as a partner, Bookshop would not have been possible.

--

--