The Hidden Problem With CVSS: The Same CVE Gets Different Scores - Maze

Back to Blog
November 25, 2025 Security

The Hidden Problem With CVSS: The Same CVE Gets Different Scores

NL

NUNO LOPES

How are you supposed to prioritize vulnerabilities when the same CVE gets scored as a 9.8 Critical by one organization and a 4.4 Medium by another? When CVSS scores can't even agree on the basics, the system isn't helping you manage risk. It's adding to the noise.

Every security team I talk to has the same complaint: they're drowning in CVEs. Last year alone, over 40,000 new vulnerabilities were published, a 40% increase from the year before. When you're staring at a backlog that just keeps growing, the natural instinct is to reach for CVSS scores to decide what to fix first. Critical gets patched first, High comes next, and everything else... well, you cross your fingers and hope for the best. 

But there's a problem with this approach that most teams don't realize: CVSS scores for the same vulnerability vary wildly depending on who's doing the scoring. I'm not talking about minor variations in impact ratings. I mean the same CVE getting scored as a 9.8 Critical "remotely exploitable" vulnerability by one organization and a 4.4 Medium "local access required" vulnerability by another. That's not a scoring variation. It's a fundamental disagreement about how the vulnerability actually works, leading to completely different threat models.

The CVSS Inconsistency Problem

CVSS (Common Vulnerability Scoring System) was designed to give everyone a common language for talking about vulnerability severity. The idea is simple: a vulnerability gets scored across multiple dimensions like attack vector, complexity, and impact, then you get a number from 0 to 10. A 9.8 Critical should mean the same thing whether you're looking at it in your scanner, reading the NVD entry, or checking your vendor's advisory.

Except it doesn't work that way.

The same CVE will have different CVSS scores depending on who's doing the scoring. Sometimes it's minor stuff. One vendor marks availability as High, another marks it Low. Annoying, but you can usually work with that. The real problem is when the inconsistencies are fundamental. We're talking about the same vulnerability being scored as remotely exploitable (attack vector: Network) by one source and requiring physical access (attack vector: Physical) by another. That's not a scoring variation. That's a completely different threat model.

And it happens because multiple organizations score the same CVE independently. NVD publishes scores, but so do vendors, CNAs like CISA-ADP, and security platforms. They're supposed to be scoring the same intrinsic characteristics, but they often disagree on fundamental aspects like attack vector or complexity. The result: the same CVE can look High in one place, Medium in another, and Critical somewhere else.

And it gets worse when you factor in CVSS versions. A vulnerability might be scored under CVSS 3.0, 3.1, and 4.0, producing different severity ratings for the exact same flaw. When these inconsistencies exist in the scores you're using to prioritize, they directly impact which vulnerabilities your engineers spend time on.

Real examples where CVSS gets it wrong

CVE-2024-38541

This Linux kernel vulnerability involves a buffer overflow in the of_modalias() function. If the buffer is too small for the initial snprintf() call, the length parameter goes negative, and the string parameter can point beyond the buffer's end.

CISA-ADP scored this as a 9.8 Critical with attack vector Network (CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H).

That's essentially saying an attacker can exploit this remotely over the internet with no privileges required, no user interaction, and achieve high impact across confidentiality, integrity, and availability.

But IBM X-Force tells a completely different story. They scored it as a 4.4 Medium with attack vector Local (CVSS:3.1/AV:L/AC:L/PR:H/UI:N/S:U/C:N/I:N/A:H). 

According to their assessment, an attacker needs local access and high privileges to trigger this vulnerability. That's not a remote internet threat. That's someone who already has significant access to your system.

The difference between these two assessments isn't subtle. One says, "remotely exploitable critical vulnerability, patch immediately." The other says "local privilege escalation requiring high privileges, medium priority." 

If you're prioritizing based on the CISA score, you might be pulling engineers off other work for what's actually a much lower risk issue. If you're using the IBM score, you might be ignoring something that could be remotely exploitable.

The reality is that one of these assessments is fundamentally wrong about how this vulnerability works. And this isn't an edge case.

CVE-2025-0665

This vulnerability in libcurl involves a double-close of an eventfd file descriptor during connection teardown after a threaded name resolution. The same file descriptor gets closed twice due to a mistaken #ifdef that left a superfluous close() call in the code.

CISA-ADP scored this as a 9.8 Critical with attack vector Network (CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H). 

According to this assessment, an attacker can remotely exploit this over the internet with no privileges, no user interaction, and achieve high impact on confidentiality, integrity, and availability.

Red Hat scored it completely differently: CVSS:3.1/AV:L/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:L. They rated it as Local attack vector with Low availability impact only.

The curl project itself, which discovered and fixed the bug, went even further. They assigned it a severity rating of Low. Their reasoning is documented in their security advisory: the bug requires specific build configurations (threaded resolver, eventfd support, 64-bit architecture), both close() calls happen within dozens of instructions of each other (limiting exploitation windows), and the bug causes unreliable behavior that users noticed and avoided.

So we have the same vulnerability rated as:

  • 9.8 Critical by CISA (remotely exploitable, high impact across the board)
  • Medium by Red Hat (local only, low availability impact)
  • Low by the curl project (the people who actually wrote and fixed the code)


Attack vector misclassification

These aren't isolated incidents. Attack vector misclassification is common enough that it fundamentally undermines CVSS as a prioritization tool. A vulnerability marked as Network exploitable (AV:N) might get treated as a Critical emergency. Mark that same vulnerability as Local (AV:L), and it drops to Medium priority. The difference in how your team responds is massive, even though the actual vulnerability hasn't changed.

Why manual review doesn't scale

You might be thinking: just manually review each CVE to determine the correct attack vector and actual exploitability. And you'd be right that this would solve the problem. But it's not realistic.

Over 40,000 CVEs were published in 2024. That's roughly 110 new vulnerabilities every single day. Even if you only reviewed the Critical and High severity ones, you're still looking at thousands of CVEs that would each require deep technical analysis to understand the actual exploitation path.

And it's not a quick review. To properly assess whether CVE-2024-38541 is Network or Local exploitable, you need to understand Linux kernel boot sequences, device tree parsing, and when different subsystems initialize. That's hours of work per CVE, and you have thousands of them.

Most security teams don't have the staffing to manually investigate even a fraction of their vulnerability backlog. So they fall back on CVSS scores, knowing the scores are inconsistent, because there's no other scalable option.

Solving CVSS inconsistencies at scale

AI agents can investigate vulnerabilities the same way a senior security engineer would, examining the actual exploitation conditions rather than relying on inconsistent CVSS scores.

Let's look at how an AI agent analyzed CVE-2024-38541, the Linux kernel vulnerability where CISA scored it as remotely exploitable with attack vector Network (9.8 Critical), but IBM scored it as requiring local access (4.4 Medium).

The AI agent's job was to determine the correct attack vector. To do this, it needed to understand how the vulnerability actually works and what access an attacker would need to exploit it.

The agent examined the of_modalias() function and discovered something critical: this function takes a device tree as an argument. A device tree is a data structure that describes the hardware configuration and is processed from local files during the kernel boot process.

But the agent didn't stop there. It investigated when this function gets called in the boot sequence. The analysis revealed that the device tree processing happens during early kernel initialization, before the network stack is even started. The vulnerable code path executes when the system is parsing hardware configuration from local files, not when it's processing network traffic.

The agent's conclusion on the attack vector

Local, not Network. An attacker would need local access to the system to trigger this vulnerability. Remote exploitation over the network is impossible because the network stack hasn't been initialized yet when this code runs.

This is exactly the kind of analysis that reveals CVSS inconsistencies. CISA's assessment of the attack vector Network (AV:N) was fundamentally wrong about how the vulnerability works. 

And this scales. We can run this type of investigation across thousands of CVEs, correcting attack vector misclassifications and giving security teams accurate information about which vulnerabilities are actually remotely exploitable versus which ones require local access.

What should you do?

CVSS isn't going away, and that's fine. The problem isn't that CVSS exists. The problem is relying on it as your only source of truth, even though it was never designed to be a complete prioritization system on its own.

Use CVSS scores as one data point among many. Look at CISA's KEV catalog to see what's being actively exploited. Check vendor advisories for context. But don't stop there. The vulnerabilities that matter are those where an attacker can actually reach the vulnerable code and meet the conditions needed for exploitation. As a security engineer myself, when I'm investigating a CVE, I consistently check a few places:

  • The project's official security advisory (like the curl advisory for CVE-2025-0665) - vendors who built the software often have the most accurate technical details
  • Vendor-specific CVE databases - Red Hat, Ubuntu, and SUSE often publish detailed assessments with their reasoning
  • The actual commit that fixed the vulnerability - GitHub/GitLab commits show exactly what changed and often include technical context in the commit message
  • Exploit databases and POCs - if a working exploit exists, that tells you a lot about actual exploitability (remember, this is only part of the equation)

If you find yourself checking the same sources repeatedly for every CVE, that's a perfect opportunity to build an agent that scans those sites and synthesizes the findings automatically. Start simple: an agent that pulls the official advisory, the vendor assessment, and the fix commit for any CVE you're investigating. That's a practical way to bring AI into your vulnerability management process without overcomplicating it.

Having this process in place helps keep our teams from chasing false alarms based on inflated CVSS scores or ignoring real threats because the scores don't reflect actual exploitability. We can do better than crossing our fingers and hoping we patched the right things.