Honeypots mailing list archives
Re: correlating sys_read data to "source" ip
From: Camilo Viecco <cviecco () indiana edu>
Date: Thu, 17 Aug 2006 10:31:40 -0400
Hi Troy... <mini_rant reason='inssuficient sleep time' extra='please forgive me for any harshness, we need more documentation''> The walleye dataset is a relational dataset. Many questions can only be answered by joining results from multiple tables. you should look at the paper "towards a third generation data architecture for honeynets" which describes the philosophy in more detail. You could also look at how walleye combines the tables to generate answers (ugly buy enlightening). There are also limits on what can be asked in a single query... </ mini_rant> But let me be more helpful. Let me rewrite what I think you are asking: given a process ID, what ip address generated the activity? that question in general cannot be answered as many processes do not connect to the outside word. Further, many processes like sshd spawn children to handle activity. Sshd, for example usually spawns a user shell after the authentication. So the read activity from a bash shell is not directly related to the socket. That is one of the reasons why the process tree viewer in walleye is extremely important. Thus, in the general case (with the current dataset) is not possible to accurately answer that question in a single query. You might want to think in terms of process hierarchies in order to have a plausible answer to the question: 1. Is my process is related to any argus flow? 2. Is my parent process is related to any argus flow? (and recurse) For ssh usually you can find the related flow, with the parent or the parent's parent pid. It is not pretty, but is the best we can do (for now). So steps: 1. find the process you are interested in: select sensor_id,process_id from sys_read where ****YOUR_CRITERIA_HERE*** 2. find the sockets related to such process... (maybe you are lucky): select sensor_id,argus_id from sys_socket where process_id= (your previous result) 3. if there are sockets, you can use the argus_id to query the argus table for the information (you can actually mix all this in just one large query if you are using a more recent version on mysql) 4. if unsuccessful AND you know that the activity of interest is related to a network connection find the parent process id: select * from process_tree where child_process_id=your_current_process_id_of_interest 5 use this new information go to step 2... (iterate until you get tired/or find the answer/or find to many answers) Good luck Camilo Viecco PS. Some extra comments inline with your email below: troy d. straszheim wrote:
I wonder if I am actually dealing with a misconfiguration of some kind. Looking at table 'process': mysql> select sensor_id, process_id, src_ip from process where process_id = 6226; +-----------+------------+-----------+ | sensor_id | process_id | src_ip | +-----------+------------+-----------+ | 167772226 | 6226 | 167772288 | +-----------+------------+-----------+ 1 row in set (0.00 sec) that sensor_id is actually the IP of the honeywall's administration interface (!?), and the src_ip is the ip address of the sensor.
Yes, that is currently the default, it is not very appropiate and generates some problems for distributed datasets. I have written a script that generates an almost unique id under most circumstances, but what you are seeing is what is expected.
I've poked around in the argus table, and I don't see I don't see the correct IP addresses anywhere in the argus table. Could it be that the data hasn't been ingested yet?
That is a possiblity, there is a delay of aroung 20 seconds, the last time I checked. but if you cannot see the conversations after one minute there are problems somewhere else
Current thread:
- correlating sys_read data to "source" ip troy d. straszheim (Aug 16)
- Re: correlating sys_read data to "source" ip Camilo Viecco (Aug 17)
- Re: correlating sys_read data to "source" ip troy d. straszheim (Aug 18)
- Re: correlating sys_read data to "source" ip Camilo Viecco (Aug 17)