Robustness

What could go wrong?

If you tried to do scraping yourself or your company is currently doing this, you might have noticed, that getting first results is not that hard. Retrieving big quantities on a regular basis and reliably is a different story.

Things can and will go wrong. Scrapers get blocked by firewalls and target sites change their HTML structure.

Most of the time, we will be able to retrieve the data you want in a reasonably short time. Still, you need to be prepared for when this is not the case. Here is a non-exhaustive list of measures we suggest you take:

1. Log all requests and responses

If anything goes wrong it will be so much easier for us to help you when you can provide us with the information necessary to investigate and reproduce the problem. Please log:

  • Your HTTP request with all the parameters
  • Our HTTP response including status code and body
  • Error message in case the HTTP call did not work at all
  • A timestamp with time zone

πŸ“˜

Find reasons in the body

When the API responds with a client or server error (HTTP status codes 4xx and 5xx), we usually respond with a JSON body that includes a reason and suggestions on how to handle the error.

However, some error cases caused by infrastructure components lower in the stack may not provide a JSON response.

{
  "success": false,
  "reason": "unauthorized",
  "parameter": "token",
  "comment": "Provide your API key with the parameter 'token'!"
}

2. Have a Plan B

What will your system do when

  • a product could not be scraped?
  • an attribute - say shipping costs - of a product could not be scraped?
  • a job takes longer than usual to finish?
  • the API is down for maintenance?

You need to have good answers to these questions and these answers need to reflect in your system.

πŸ“˜

Example: Repricing

When no current and valid market data is available to your repricing system, consider using a reasonable default price instead. Always work with minimum and maximum prices that have been calculated independently of market data.

🚧

Disclaimer

While it is our job to provide you with the best data we can, it is your responsibility to not cause harm to your business. Please make sure you employ appropriate safety measures in your system and processes.

3. Retry, smartly

There are external factors like network outages that can never be ruled out for a web-based system. Many hick ups can be handled robustly with exponential backoff retries.

When you catch an error that is likely to be temporary (like a network issue), try again for a finite number of times, leaving more and more time in between the attempts. If all your retries fail, escalate appropriately.

4. Meter and alert

It is a good idea to meter the system health and alert when metrics leave the norm. Here are some ideas on what to measure:

  • Number of requests to the API
  • Number of retries per request, by error code
  • Response status codes
  • Number of credits used

5. Keep up to date

It is important to keep your system up to date regarding libraries, security patches etc.

🚧

SSL Cipher Suites

As a security precaution, we disable SSL Cipher Suites after they get deprecated by the security community. If you fail to supply current, secure cipher suites, your system may not be able or may stop to be able to communicate with our servers via HTTPS.

You can also communicate with our servers via HTTP, but we do not recommend to do so for security reasons.

6. Expect some changes to the API

We regard some changes of the API as backwards compatible. Please see our API change policy for a list.

🚧

API changes

Please make sure that your integration tolerates any and all types of changes that are in our list of backwards compatible changes.