Dailydave mailing list archives

Re: Tooling, Graph Databases, etc.


From: Andre Gironda via Dailydave <dailydave () lists aitelfoundation org>
Date: Tue, 27 Jun 2023 10:01:16 -0700

Maybe not from a vuln hunter perspective, but from a threat hunter view we
usually mill out to targeting platforms, e.g., Vertex Synapse (commercial)
and/or MISP (open-source) for graph-ready pivoting and analysis

For example, TheDFIRReport hosts a MISP that contexts the process objects
template --
https://github.com/MISP/misp-objects/blob/main/objects/process/definition.json
-- and fills it up with ATT&CK tags
Sure, I'd also context that with YARA or capa rules, but MISP etc also
supports embedding and mapping those. While the VirusTotal capa support is
not the full-context kind you get from the entire tool (VTE doesn't show
the `-vv`output, only the top-level jank), and it's also not capa-explorer
level where you get the function mappings, but I'd be curious to see the
behavior profiling and livehunting (which the YARA vt module now supports
Sigma-style rule checking against the behavior profiles) across, say, the
Immunity Security CANVAS leaked binaries that made it up on VT years back,
or other VT-visible exploit codes

What are the core problems attempting to solve here? For malware analysis,
it's typically mapping in changes, historical views, popular function
analysis, and evolution
For function analysis, all of the reversing platforms, e.g., IDA Pro (sub
BinaryNinja, sub Ghidra, sub Radare2/Cutter) have a capa-explorer plugin of
sorts, and usually commonly-found function analysis routines (for IDA Pro
it's called Lumina, but each of these also supports the concept, i.e.,
radare2 zignatures. Then there's also Writeprint and other more classic
techniques that work very-broadly

In terms of changes and evolution, malware analysis on the reversing
platforms hasn't seen much intooling, but they are capable. There's ASM2VEC
(of course, very different compared to node2vec/graph2vec but having
relevance) as seen in IDA Pro plugins such as --
https://github.com/McGill-DMaS/Kam1n0-Community -- (more on kam1n0 here --
https://mcgillnews.mcgill.ca/s/1762/news/interior.aspx?sid=1762&gid=2&pgid=2292
)

Is there a knowledgebase approach that is cross-functional and cross
vuln-/mal-/exploit-/threat- efforts to be had here?

On Tue, Jun 27, 2023 at 9:24 AM Shane Macaulay via Dailydave <
dailydave () lists aitelfoundation org> wrote:

There is joernio's ghidra2cpg, not sure why they now seem to be pushing a
forked set of patches https://github.com/joernio/ghidra, probably the DB
format changes too rapidly or some other "we automatically intake unknown
relationships lost statically".  That might get part of what you're looking
for, even though, it isn't an exact fit, bringing in some higher level
tooling, like all the graphql UI's that contextualize queries with type
context are so helpful, whenever I don't have context aware syntax support,
thar barrier to actually do anything limit's my enthusiasm so that only the
most impactful (perceived before getting too far) get my attention (and I'm
often wrong so :).  I forget if joern still uses Neo4j, I am confident that
it's the best FOSS available for describing code/binaries right now.

Getting more tools in this space is a great initiative that deserves
attention.  Being able to communicate so expressively, codifying knowledge
for bugs some helpers around supporting guided generation of queries for
arbitrary conditions, the benefits for invariant analysis (as can been seen
with Semmle/CodeQL) are extreme.

On Mon, Jun 26, 2023 at 3:46 PM Dave Aitel via Dailydave <
dailydave () lists aitelfoundation org> wrote:

There's a new Ghidra release last week! Lots of improvements to the
debugger, which is awesome. But this brings up some thoughts that have been
triggering my vulnerability-and-exploitation-specific OCD for some time now.

Behind every good RE tool is a crappy crappy database. Implicitly we, as
a community, understand there is no good reason that every reverse
engineering project needs to implement a key-value store, or a B-Tree
<https://github.com/NationalSecurityAgency/ghidra/tree/master/Ghidra/Framework/DB/src/main/java/db>,
or partner with a colony of bees which maintain tool state by
various wiggly dances. But yet each and every tool has a developer with
decades of reverse engineering experience on rare embedded platforms either
building custom indexes in a pale imitation of a real DB structure or
engaging in insect-based diplomacy efforts.

I think the Ghidra team (and Binja/IDA teams!) are geniuses, but they are
probably NOT geniuses at building database engines. And reading through the
issues <https://github.com/NationalSecurityAgency/ghidra/issues/985>
with ANY reverse engineering product you find that performance even for the
base feature-set is a difficult ask.

My plea is this: We need to port Ghidra to Neo4j as soon as possible.
Having a real Graph DB store underneath Ghidra solves the scalability
issues. I understand the difficulty here is: There are few engineers who
understand both Neo4j and reverse engineering to the point where this can
be done. I mean, why do it in Neo4j and not PostGres? An argument can be
made for both, in the sense that PostGres is truly Free and the most solid
DB on the market. The pluses for Neo4j are that RE data is typically
graph-based more than linear.

I spent the last two years learning graph dbs, out of some masochistic
desire and ended up getting certified - and I can still RE a little bit. I
will manage the team porting Ghidra to Neo4j if someone funds it. :)

Either way, sooner is better than later. There are so many companies and
people relying on these tools that it seems silly to do anything else.

-dave
P.S. Yes, I remember BinNavi used MsSQL installs for its data, and this
was annoying to install but ... I get why Halvar did it at the time. It's
because he had real work to do and building a DB was not it. I can only
assume Reven doesn't use their own DB? I mean the benefits for
interoperability would be huge between tools. . . like literally everything
you want to do with these tools is better with a real DB underneath.


_______________________________________________
Dailydave mailing list -- dailydave () lists aitelfoundation org
To unsubscribe send an email to dailydave-leave () lists aitelfoundation org

_______________________________________________
Dailydave mailing list -- dailydave () lists aitelfoundation org
To unsubscribe send an email to dailydave-leave () lists aitelfoundation org

_______________________________________________
Dailydave mailing list -- dailydave () lists aitelfoundation org
To unsubscribe send an email to dailydave-leave () lists aitelfoundation org

Current thread: