February 2019, Reading time: 2 minutes
Pandas is a python package that is used for data analysis. You can do with pandas whatever you can do with Excell, but usually faster. First we will capture some packets from wireshark. I left wireshark run for a couple of mins.
then I go to File, Export packet dissections, as CSV.
The columns that you want to appear in csv, must be visible on wireshack. I have added some columns like total_length of ip packet and tcp segment size.
First I read the csv file (cell 1) into a pandas dataframe df. then I print the first 5 rows to see how my data looks like. df.shape gives me the rows and columns.
on cell 5 I am going to keep all rows that the source is not my pc, because I am interested in incoming traffic. I see the rows now are 188815
Cell 26: I do a groupby(‘Protocol’) and count(), this will print the packets per protocol, since each line is a packet.
Cell 27: I can also sort_values and I see that the most packet are by UDP and TCP as expected
Next I am just replacing column names that have spaces with _ for better manipulation
then I do a histogram for tcp packets total_length and udp packets total_length. We see that most packets are between 1400 and 1500 bytes
then I calculate the sum of total_length for each protocol and display it in a bar plot. I divid by 1024⁄1024 to convert bytes to MBytes
Then I print the packet count by protocol
Finally I divide the total packet size by the packet count to find the average packet size
Just a note, although I have ARP packets, the packet size shows zero, because I calculate the total_length of ip packets and arp packets are only layer 2. If I wanted to include them, I should have taken into account, ethernet frame size and not ip packet size