Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] [Question] How to detect taint on elements in a collection #18098

Open
JustusAdam opened this issue Nov 25, 2024 · 2 comments
Open

[C++] [Question] How to detect taint on elements in a collection #18098

JustusAdam opened this issue Nov 25, 2024 · 2 comments
Labels
question Further information is requested

Comments

@JustusAdam
Copy link

I am trying to detect the flow into potential_leak in the following, simplified code. This is just the minimal example, the vector can be constructed any way, e.g. with a series if push_back or via iterator etc and I’m trying to find a way to reliably detect taint on any elements at the sink location. Also assume that I do not have access to the source code of potential_leak and thus could detect the taint when the elements are accessed.

std::vector<int> v { sensitive_data };
potential_leak(v);

My simplified query is

import cpp
import semmle.code.cpp.dataflow.new.TaintTracking

module TaintConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    exists(VariableAccess v | 
      v.getTarget().getName() = "sensitive_data" 
    }
  }

  predicate isSink(DataFlow::Node sink) {
    exists(Call c |
      c.getTarget().getName() = "potential_leak" and
      c.getArgument(0) = e
    )
  }
}

module Flow = TaintTracking::Global<TaintConfig>;

from DataFlow::Node src, DataFlow::Node sink
where Flow::flow(src, sink)
select src, sink

However this does not detect the flow. Is there some way to select the elements inside of v as sinks for this query?

CodeQL version: 2.19.3

@JustusAdam JustusAdam added the question Further information is requested label Nov 25, 2024
@redsun82
Copy link
Contributor

👋 @JustusAdam

I'm guessing you might have edited your code snippet leaving out some information (the exists in isSource is not closed, and there's an undefined e in isSink).

However, trying out this example, it would indeed seem we don't currently track taint through vectors. I will ask my colleagues if it's really the case.

In the meantime, this seems to cover your simple example, by defining additional flow steps:

import cpp
import semmle.code.cpp.dataflow.new.TaintTracking

module TaintConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    source.asExpr().(VariableAccess).getTarget().getName() = "sensitive_data"
  }

  predicate isAdditionalFlowStep(DataFlow::Node lhs, DataFlow::Node rhs) {
    exists(ConstructorCall c | c.getTarget().getName() = ["vector", "initializer_list"]
       and c = rhs.asExpr() and c.getAnArgument() = lhs.asExpr())
  }

  predicate isSink(DataFlow::Node sink) {
    exists(Call c | c.getTarget().getName() = "potential_leak" and
    c.getArgument(0) = sink.asExpr())
  }
}

module Flow = TaintTracking::Global<TaintConfig>;

from DataFlow::Node src, DataFlow::Node sink
where Flow::flow(src, sink)
select src, "flow to $@", sink, sink.toString()

notice however that modelling all ways in which an element can be inserted into a vector might be tricky (push_back, emplace, assign, insert at the minimum, but then also via iterators like std::back_inserter or std::iota or the iterator overload of constructors, assign and insert, and probably other ways...). I will circle back after I'll ask if there's no better way.

@redsun82
Copy link
Contributor

👋 @JustusAdam

  • It turns out we do model flow through many vector constructors and member functions, the full list of things covered can be found here. You may notice that:
    • push_back, emplace_back, insert, assign from an iterator are covered
    • construction from an initializer_list is not. I've opened an internal issue to cover that, so we may have that in the future (though I cannot commit on any specific roadmap). In the meantime the isAdditionalFlowStep I provided should get that covered.
  • If a sink is a function call argument on a container, like potential_leak in your example, then the container containing tainted elements will indeed be implicitly tainted as well. So nothing needs to be done to get that.
  • One has, however, to pay attention when defining sources, sinks and additional flow steps, to use the correct predicate among Node.asExpr and Node.asIndirectExpr. The former deal with taint propagating through values, the latter with taint propagating through references. In your case we should for example use asIndirectExpr in isSink if potential_leak takes the vector in by reference. As also push_back and emplace_back take arguments by reference we may want to cover both cases in isSource using [source.asExpr(), source.asIndirectExpr()] instead of just source.asExpr().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants