Analyzing Wireshark Data with Pandas

by: George El., February 2019, Reading time: 2 minutes

Pandas is a python package that is used for data analysis. You can do with pandas whatever you can do with Excell, but usually faster. First we will capture some packets from wireshark. I left wireshark run for a couple of mins.

wireshark capture

then I go to File, Export packet dissections, as CSV.

wireshark capture

The columns that you want to appear in csv, must be visible on wireshack. I have added some columns like total_length of ip packet and tcp segment size.

First I read the csv file (cell 1) into a pandas dataframe df. then I print the first 5 rows to see how my data looks like. df.shape gives me the rows and columns.

on cell 5 I am going to keep all rows that the source is not my pc, because I am interested in incoming traffic. I see the rows now are 188815

wireshark capture

Cell 26: I do a groupby(‘Protocol’) and count(), this will print the packets per protocol, since each line is a packet.

Cell 27: I can also sort_values and I see that the most packet are by UDP and TCP as expected

wireshark capture

Next I am just replacing column names that have spaces with _ for better manipulation

wireshark capture

then I do a histogram for tcp packets total_length and udp packets total_length. We see that most packets are between 1400 and 1500 bytes

wireshark capture

then I calculate the sum of total_length for each protocol and display it in a bar plot. I divid by 10241024 to convert bytes to MBytes

wireshark capture

Then I print the packet count by protocol

wireshark capture

Finally I divide the total packet size by the packet count to find the average packet size

wireshark capture

Just a note, although I have ARP packets, the packet size shows zero, because I calculate the total_length of ip packets and arp packets are only layer 2. If I wanted to include them, I should have taken into account, ethernet frame size and not ip packet size

comments powered by Disqus