[BLOG] Networking Functions on FPGA
This is a short note on networking functions on FPGA, as I am implementing some of them on FPGA, such as fast packet classification, flow table, network intrusion detection, etc.
NoC on FPGA
Basics
- MAC address table (MAT): a table that maps MAC addresses to ports in switch. It sends out ETH frames to target port w/o flooding (i.e., broadcasting).
- A SoC NoC may not have MAT, as it is designed for on-chip communication instead of ethernet networking.
- NoC router: typically stateless for better efficiency
- A router can be stateful with Flow Table (FT) for flow control, to manage congestion and prioritize certain traffic flows between PEs. FT maps dataflow to routing rules or decisions.
- Functions in NoC router:
- Data processing: compression, aggregation, checksum error detection/correction
- Traffic control: packet scheduling, load balancing, virtual channel management
Example: TAPA NoC
- NoC router in HLS C++: Users need to write nested conditions to check the states of the FIFO queues, and decide which FIFO to read/write.
// PE connected to a router (diagram)
// [ NoC ] <- node_in/node_out -> [ router ] <- pe_in/pe_out -> [ PE ]
template <int index> void router(istream<Pkt>& node_in, istream<Pkt>& pe_in,
ostream<Pkt>& node_out, ostream<Pkt>& pe_out) {
while (!pe_in.eot() ) {
if (!pe_in.empty()) {
node_out.write( pe_in.read() );
if (!node_in.empty() && IsForThisNode( node_in.peek() ))
Pkt pkt = node_in.read() ;
auto data = decompression(pkt.data); // actual computation
pe_out.write(data)
} else if (!node_in.empty()) {
Pkt pkt = node_in.read() ;
if (pkt.dst == PE_ADDR)
pe_out.write(pkt);
else
node_out.write(pkt);
}
}
}
- A new programming interface to decouple static conditionals from the core compute logic. The router function only contains the actually compute logic (i.e., decompression)
Pkt = hcl.Struct(
data = Array[8, UInt[8]],
dst = UInt[8]
)
def router(node_in: Pkt, node_out: Pkt):
node_out = decompression(node_in)
# A ring NoC with N routers/PEs
for i in range(N):
# 1. router data movement (to PE or next router)
s.to(router[i].node_out, PE[i].data_in, when={router[i].node_in.dst == i})
s.to(router[i].node_in, PE[(i+1)%N].node_in, when={router[i].node_in.dst !=i})
# 2. PE data movement (to another PE)
s.to(PE[i].data_out, router[i].node_in)
FPGA-based NIDS (Network Intrusion Detection System)
NIDS/Networking Basics
-
IDS/IPS detection/prevention mode
- The suspicious packet can be logged and/or blocked (in IPS)
- IPS needs to be at line-rate, while IDS can run at lower speed asynchronously as a backup tap
-
Snort 3.0 with new Hyperscan x86-optimized string/RegEx matcher
- Not reaching 100 Gbps line-rate with 100K flows
- Snort rule-based IDS examples
alert tcp any any → 172.198.2.1/32 333 (content: “ieca|
4a4b|”; msg: “mounted success”;)
# Snort IDS pipeline
1. Parser (metadata: IPs, ports, etc.)
2. Reassembly (tracking flow states, reorder OOO packets)
3. MSPM (multi-pattern string matching) fast patterns in parallel
4. Full matching (RegEx, non-fast patterns etc.)
5. Action
- TCP flow state to ensure the integrity of the packets in transmission: e.g., sequence number
Implementation: Fast MSPM
- Pigasus Chapter 6. P75: FPGA is chosen as it is low-latency (no PCIe overhead), low-power, reconfigurable, and allowing fine-grained parallelism
-
MSPM on FPGA:
- Classic FSM-based approach: NFA (non-deterministic finite automata) or DFA (deterministic finite automata)
-
Dynamic Programming (DP) based approximate matching (AM)
- Families of FPGA-Based Algorithms for Approximate String Matching
- String matching, protein sequence alignment, etc. like Smith-Waterman, Needleman-Wunsch, etc.
- These approaches cannot reach line-rate in real-time latency-sensitive network environment