Honeypots mailing list archives

Re: correlating sys_read data to "source" ip

From: Camilo Viecco <cviecco () indiana edu>
Date: Thu, 17 Aug 2006 10:31:40 -0400

Hi Troy...

<mini_rant reason='inssuficient sleep time' extra='please forgive me for
any harshness, we need more documentation''>
The walleye dataset is a relational dataset.  Many questions can only be
answered by  joining results from multiple tables.
you should look at the paper "towards a third generation data
architecture for honeynets" which describes the philosophy in
more detail. You could also look at how walleye combines the tables to
generate answers (ugly buy enlightening). There are also limits on what
can be asked in a single query...
</ mini_rant>
But let me be more helpful. Let me rewrite what I think you are asking:
given a process ID, what ip address generated the activity?
that question in general cannot be answered as many processes do not
connect to the outside word.
Further, many processes like sshd spawn children to handle activity.
Sshd, for example usually spawns
a user shell after the authentication. So the read activity from a bash
shell is not directly related to the socket.
That is one of the reasons why the process tree viewer in walleye is
extremely important.

Thus, in the general case (with the current dataset) is not possible to
accurately answer that question in a single query.
You might want to think in terms of process hierarchies in order to have
a plausible answer to the question:
1. Is my process is related to any argus flow?
2. Is my parent process is related to any argus flow? (and recurse)

For ssh usually you can find the related flow, with the parent or the
parent's parent pid.
It is not pretty, but is the best we can do (for now).

So steps:
1. find the process you are interested in: select sensor_id,process_id
from sys_read where ****YOUR_CRITERIA_HERE***
2. find the sockets related to such process... (maybe you are lucky):
select sensor_id,argus_id from sys_socket where process_id= (your
previous result) 
3. if there are sockets, you can use the argus_id to query the argus
table for the information (you can actually mix all this in just one
large query if you are using a more recent version on mysql)
4. if unsuccessful AND you know that the activity of interest is related
to a  network connection
    find the parent process id: select * from process_tree where
child_process_id=your_current_process_id_of_interest
5  use this new information go to step 2... (iterate until you get
tired/or find the answer/or find to many answers)

 
Good luck

Camilo Viecco


PS. Some  extra comments inline with your email below:

troy d. straszheim wrote:


I wonder if I am actually dealing with a misconfiguration of some
kind.  Looking at table 'process':

mysql> select sensor_id, process_id, src_ip from process where process_id = 6226;
+-----------+------------+-----------+
| sensor_id | process_id | src_ip    |
+-----------+------------+-----------+
| 167772226 |       6226 | 167772288 | 
+-----------+------------+-----------+
1 row in set (0.00 sec)

that sensor_id is actually the IP of the honeywall's administration
interface (!?), and the src_ip is the ip address of the sensor.

Yes, that is currently the default, it is not very appropiate and
generates some problems for distributed datasets.
I have written a script that generates an almost unique id under most
circumstances, but what you are seeing is
what is expected.


I've poked around in the argus table, and I don't see I don't see the
correct IP addresses anywhere in the argus table.  Could it be that
the data hasn't been ingested yet?

That is a possiblity, there is a delay of aroung 20 seconds, the last
time I checked. but if you cannot see the conversations
after one minute there are problems somewhere else

Current thread:

correlating sys_read data to "source" ip troy d. straszheim (Aug 16)
- Re: correlating sys_read data to "source" ip Camilo Viecco (Aug 17)
  - Re: correlating sys_read data to "source" ip troy d. straszheim (Aug 18)