Hashehri/Network-Traffic-Classification-UNSW-NB15

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

By(Hatim Alshehri & Fahad Alnafisa)

The network intrusion detection system (NIDS) has become an essential tool for detecting attacks in computer networks and protecting the critical information and systems. The effectiveness of an NIDS is usually measured by the high number of detected attacks and the low number of false alarms. Machine learning techniques are widely used for building robust intrusion detection systems, which adapt with the continuous changes in the network attacks. However, a comparison of such machine learning techniques needs more investigation to show their efficiency and appropriateness for detecting sophisticated malicious attacks. The project goal is to detect and examine network traffic from malicious attacks by generating a machine learning model to classify the network traffic.

Intrusion Detection Systems (IDS) are precisely present to prevent attacks and infiltration to Networks, which might affect the organization. They monitor network traffic for suspicious activities and issue alert in case of issues.

  • Signature-based intrusion detection– In this kind incoming attacks are compared with pre-existing database of known attacks.
  • Anomaly-based intrusion detection- It uses statistics to form a baseline usage of the networks at different time intervals. They were introduced to detect unknown attacks.

Based on where they discover, they can be classified into:

  • Network intrusion detection (NIDS)
  • Host intrusion detection (HIDS)

With the rise of Internet usage, it is very important to protect Networks. The most common risk to a network’s security is an intrusion such as brute force, denial of service or even an infiltration from within a network. With the changing patterns in network behavior, it is necessary to switch to a dynamic approach to detect and prevent such intrusions.

Although there were few daatset available before this dataset for NIDS, but they were generated decades ago and do not provide realistic outputs. That's why this dataset had been created by Nour Moustafa to tackle existing problems like: unbalanced dataset, missing values etc.

This data set has a hybrid of the real modern normal and the contemporary synthesized attack activities of the network traffic. Existing and novel methods are utilised to generate the features of the UNSW- NB15 data set. This data set is available here.

  • The obtained dataset consists of over 175k network traffic records with 45 features.

  • Field Description:

Field NameDescription
idunique identifier for each attack
durRecord total duration
protoTransaction protocol
servicehttp, ftp, ssh, dns ..,else (-)
stateThe state and its dependent protocol, e.g. ACC, CLO, else (-)
spktsSource to destination packet count
dpktsDestination to source packet count
sbytesSource to destination bytes
dbytesDestination to source bytes
rateThe avrage attack rate
sttlSource to destination time to live
dttlDestination to destination time to live
sloadSource packets retransmitted or dropped
dloadDestination packets retransmitted or dropped
slossSource packets retransmitted or dropped
dlossDestination packets retransmitted or dropped
sinpktSource inter-packet arrival time (mSec)
dinpktDestination inter-packet arrival time (mSec)
sjitSource jitter (mSec)
djitDestination jitter (mSec)
swinSource TCP window advertisement
dwinDestination TCP window advertisement
stcpbSource TCP sequence number
dtcpbDestination TCP sequence number
tcprttThe sum of ’synack’ and ’ackdat’ of the TCP
synackThe time between the SYN and the SYN_ACK packets of the TCP
ackdatThe time between the SYN_ACK and the ACK packets of the TCP
smeanMean of the flow packet size transmitted by the src
dmeanMean of the flow packet size transmitted by the dst
trans_depththe depth into the connection of http request/response transaction
response_body_lenThe content size of the data transferred from the server’s http service
ct_srv_srcNo. of connections that contain the same service and destination address in 100 connections according to the last time
ct_state_ttlNo. for each state according to specific range of values for source/destination time to live
ct_dst_ltmNo. of connections of the same destination address in 100 connections according to the last time
ct_src_dport_ltmNo of connections of the same source address and the destination port in 100 connections according to the last time
ct_dst_sport_ltmNo of connections of the same destination address and the source port in 100 connections according to the last time
ct_dst_src_ltmNo of connections of the same source and the destination address in in 100 connections according to the last time
is_ftp_loginIf the ftp session is accessed by user and password then 1 else 0
ct_ftp_cmdNo of flows that has a command in ftp session
ct_flw_http_mthdNo. of flows that has methods such as Get and Post in http service
ct_src_ltmNo. of connections of the same destination address in 100 connections according to the last time
ct_srv_dstNo. of connections that contain the same service and destination address in 100 connections according to the last time
is_sm_ips_portsIf source equals to destination IP addresses and port numbers are equal, this variable takes value 1 else 0
attack_catThe name of each attack category. In this data set, nine categories (e.g., Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms)
label0 for normal and 1 for attack records
  • python 3.9
  • pandas
  • numpy
  • matplotlib
  • seaborn
  • sklearn
  • PrettyTable
  • XGBoost
  • Pickle

About

Binary Classification for detecting intrusion network attacks. In order, to emphasize how a network packet with certain features may have the potentials to become a serious threat to the network.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published