Why WAFs Cannot Effectively Protect Sensitive APIs

Author
Brian Joe
Published on
May 16, 2024
Read time
6
Brian Joe
May 16, 2024
6
min

WAFs struggle with API protection

Historically, security teams have had decent success using a Web Application Firewall (WAF) to protect traditional web applications. However, they’ve struggled to achieve the same level of success using WAFs to protect APIs.

The main reason for this is that APIs and API attacks are more complex. To detect and block them well, you need context, data, and granular control that WAFs typically don’t have. In this blog post, we’ll share an example of how a WAF attempts to protect an API from an attacker using stolen tokens to abuse API endpoints that process sensitive data to show where WAFs come up short.

WAFs lack endpoint inventory and risk context

WAFs are typically designed around the logical concepts of requests and IPs. They don’t understand the fundamental API concepts of endpoints and schemas, which means they are missing key context for detecting our example attack. In the dashboard shown below, for example, there’s no way to see how the requests flagged by the WAF are distributed across:

  • API endpoints that are actually in our inventory versus ones that don’t exist
  • Endpoints that process sensitive data versus ones that don’t
  • Endpoints that make up the schema for Product A versus Product B

In our example, the target for our protection policy is “API endpoints that process sensitive data.” Since the WAF doesn’t give us that information natively, we’ll have to try to build that list of endpoints on our own.

WAFs don’t understand API endpoints

Notice how the WAF lists the existing API endpoints? It has no concept of the API structure. I’s just a giant firehose of request traffic.  Which endpoints are at higher risk? What endpoints have sensitive data? WAFs have no idea; they are designed around the logical concept of requests (which can hit any endpoint), not endpoints, which are fundamental to APIs.

WAF logs don’t have enough information

If we want to build a list of API endpoints, it looks like we’re going to have to look at the logs.  But wait, our WAF doesn’t have logging enabled by default!  Let’s get those turned on.

Now that logging is enabled, we need to query and inspect them, but we can’t do that from the WAF itself. Instead, we’ll need to use our SIEM (in this case, Cloudwatch) to examine logs in detail.

From these logs, we’ll want to figure out which API endpoints (referred to in WAF as “Paths”) contain sensitive data. Unfortunately, that data isn’t included in the WAF logs by default. Nor is data on API tokens, which we’ll need for the other component of our example.

This approach doesn’t address our use case.

Since we can’t solve this with our SIEM alone, let’s see if we can do it with a more complex business intelligence tool like AWS Athena.

Building a new data pipeline and enriching data logs

First, we need to set up a separate data pipeline that streams the logs through something like Kinesis into an S3 bucket before we can explore them with Athena. This isn’t trivial. As the reference architecture from AWS shows, if we have the skills, knowledge, and permissions within the security team, a few days of work will be involved. If we don't, we'll have to get buy-in from another team to build it for us.

Once we have the data pipeline, we need to get the missing data. This probably means getting development teams to make changes to log more API request/response body details and API tokens. We also need to sort out privacy issues from logging that data and issues that may arise from the increased log volume.

This isn’t easy to do organizationally. It might take weeks to get running and could be impossible to maintain forever.

Finding sensitive data in request logs via SQL + RegEx

AWS Athena

Once we have the data pipeline and the data, and assuming we have the level of detail we need, we’ll need to use AWS Athena to query the logs and analyze the results to determine which API endpoints might be processing sensitive data.

For example:

SELECT *
FROM http_request_logs
WHERE response_body ~ '4[0-9]{12}(?:[0-9]{3})?'       -- Visa
  OR response_body ~ '5[1-5][0-9]{14}'               -- MasterCard
  OR response_body ~ '3[47][0-9]{13}'                -- American Express
  OR response_body ~ '3(?:0[0-5]|[68][0-9])[0-9]{11}'-- Diners Club
  OR response_body ~ '6(?:011|5[0-9]{2})[0-9]{12}'   -- Discover
  OR response_body ~ '(?:2131|1800|35\d{3})\d{11}';  -- JCB

This may get us a working (albeit brittle) detection for a single, basic sensitive data type (like a credit card number).

To expand this to other sensitive data types (e.g., CVV numbers, expiry dates, drivers license numbers), we’ll need to look at significantly more query writing and false-positive tuning work.

Finding abusive API tokens via custom SQL query

Now that we’ve found some API endpoints processing sensitive data, let’s look at using the WAF to build the detection for abusive API tokens.

The first challenge we’ll have to solve is how to log API tokens in a way that will let us query them without creating privacy issues. (This is not an easy fix).

Assuming we find a way to do that, our next challenge will be to write a query that accurately detects tokens that show signs of abuse. This is more complex than the RegEx-based queries we used to detect sensitive data processing because it’s necessary to include an “over-time” element that indicates potentially abusive behavior (like generating excessive HTTP errors) for each unique token.

This might look something like this:

SELECT api_token, DATE_TRUNC('minute', timestamp) AS minute,
      COUNT(*) AS error_count
FROM api_logs
WHERE response_code >= 400 AND response_code < 600
GROUP BY api_token, DATE_TRUNC('minute', timestamp)
HAVING COUNT(*) > 10
ORDER BY minute;

Then, we’ll probably have to test and tune the threshold for HTTP errors by modifying the query and monitoring the results.  It could also get expensive quickly to run this against a large dataset. We won’t get into it here, but eventually it will be apparent that we really need to count not just HTTP errors from each unique token, but also per unique request. There is a difference between asking for the same record 50 times and receiving an error and asking for 50 unique records and receiving an error. Only the second one is possible evidence of abusive behavior.

So while in theory, it’s possible to use WAF logs to build a list of API endpoints that process sensitive data and a list of potentially abusive API tokens, it’s not something that’s quick, easy, or scalable.

The attempt to create a WAF policy

Imagine ending up with a list of sensitive endpoints and tokens and then wanting to use the WAF rule builder to put in a rate-limiting policy to alert or block a suspected abusive API token from sending too many requests to a given endpoint.

In a WAF, we’ll need to create a WAF rule that matches any path on our list of endpoint Access Control Lists (ACLs), which is on a spreadsheet somewhere, and then create another condition that matches any token, which is on a different spreadsheet. As we can see from the example rule builder screen shot above, that isn’t practical for a security engineer to implement. Creating a rule with a string-based match with multiple conditions is likely to have conflicts, be extremely difficult to maintain, and be riddled with false positives.

If we somehow made it this far, all we’ll need to do is commit to maintaining the data pipeline, ensuring developers continue logging things sufficiently while staying compliant, managing and tuning detection queries, and maintaining a very complicated WAF rule.

WAFs are not a workable solution

In summary, to implement a policy that protects an API from an attacker using stolen tokens to abuse API endpoints that process sensitive data, we’ll need to:

  1. Build a new data pipeline.
  2. Convince developers to make logging changes.
  3. Find a way around privacy and compliance issues associated with logging request/response data and tokens.
  4. Write, tune, and maintain SQL queries.
  5. Build and maintain a complex WAF rule.

In most cases, this simply won’t work.

API protection with Impart Security

Now let’s look at how to solve the same use case using Impart’s API security platform.

Detect endpoints that process sensitive data with built-in security functions

Impart comes with sensitive data detections out of the box. They are code-based detections that leverage a domain-specific language (DSL) to run automatically and generate lists of endpoints that process sensitive data, without you having to write, tune, or maintain queries. They are accurate, customizable, and can be ordered using a graph-based interface. This allows them to be layered in a way that reduces false positives, for example, only looking for credit card expiry fragments when a higher confidence detection (such as an algorithmically validated credit card number) has already been detected.

Detect abusive API tokens with over-time detections

Impart comes out of the box with a full suite of JWT detections (available as Rule Templates) that can be used to identify potential authorization anomalies or forgeries. In addition to these detections, Impart can also detect abusive behavior over time, such as API tokens that are used within a short period from two different geographical locations, API tokens that are seeing high volumes of error response codes for different endpoints, or API tokens that are seeing excessive usage. Rule Templates can be easily customized to meet specific use cases. Similar to security functions, these detections can be layered using a graph interface to create highly accurate security policies that identify suspicious API tokens.

Dynamic lists and one simple API Firewall rule

With sensitive data and abusive API token detections turned on, Impart can generate dynamic lists for each. As new sensitive data endpoints and potentially abusive tokens are detected, they’ll be added to the lists automatically.

These lists can be referenced in a single rule that limits requests to any endpoint on the sensitive data endpoints list, from any token on the abusive tokens list. This rule can be created and tuned quickly using Impart’s rule templates and rule editor, and doesn’t require logging request/response bodies or API tokens.

A self-maintaining process to secure dynamic threats

Because all of these concepts are natively built into Impart, this entire process is self maintaining.  Endpoints with sensitive data are constantly cycled in and out of the endpoint list based on what data they are transmitting within a user-defined window. Similarly, abusive API tokens are constantly cycled in and out of the token list based on how they behave.

Because this is all happening at runtime, this entire system can also detect things like new endpoints, new tokens, and new threats with very little maintenance or changes to firewall policies. Impart’s complete loop gives security teams the ability to detect, respond, and adapt to emerging security threats dynamically.

Wrapping Up

If you’ve made it this far, I hope you leave with a much deeper understanding of the pain involved with trying to defend against API attacks with a WAF, and how Impart makes this easier.

Learn more about Impart’s runtime API security approach by contacting us at try.imp.art and be sure to follow us on LinkedIn to stay in the loop.

Meet a Co-Founder

Want to learn more about WAF and API security? Speak with an Impart Co-Founder!

See why security teams love us