Table of Contents For
Network Systems Design
Using Network Processors (Agere version)

    Foreword   xix

    Preface   xxi

    Chapter 1   Introduction And Overview    1

    1.1 Network Systems And The Internet    1
    1.2 Applications Vs. Infrastructure    1
    1.3 Network Systems Engineering    2
    1.4 Packet Processing    2
    1.5 Achieving High Speed    3
    1.6 Network Speed    3
    1.7 Hardware, Software, And Hybrids    4
    1.8 Scope And Organization Of The Text    5
    1.9 Summary    5
    For Further Study    5

    Chapter 2   Basic Terminology And Example Systems    7

    2.1 Introduction    7
    2.2 Networks And Packets    7
    2.3 Connection-Oriented And Connectionless Paradigms    8
    2.4 Digital Circuits    8
    2.5 LAN And WAN Classifications    9
    2.6 The Internet And Heterogeneity    9
    2.7 Example Network Systems    9
    2.8 Broadcast Domains    10
    2.9 The Two Key Systems Used In The Internet    11
    2.10 Other Systems Used In The Internet    12
    2.11 Monitoring And Control Systems    13
    2.12 Summary    13
    For Further Study    13

    Chapter 3   Review Of Protocols And Packet Formats    15

    3.1 Introduction    15
    3.2 Protocols And Layering    15
    3.3 Layers 1 And 2 (Physical And Network Interface)    17
    3.3.1 Ethernet    17
    3.3.2 Ethernet Frame Format    17
    3.3.3 Ethernet Addresses    18
    3.3.4 Ethernet Type Field    19
    3.4 Layer 3 (Internet)    19
    3.4.1 The Internet Protocol    19
    3.4.2 IP Datagram Format    19
    3.4.3 IP Addresses    20
    3.5 Layer 4 (Transport)    20
    3.5.1 UDP And TCP    20
    3.5.2 UDP Datagram Format    21
    3.5.3 TCP Segment Format    22
    3.6 Protocol Port Numbers And Demultiplexing    23
    3.7 Encapsulation And Transmission    23
    3.8 Address Resolution Protocol    24
    3.9 Summary    24
    For Further Study    24

    PART I    Traditional Protocol Processing Systems    27

    Chapter 4   Conventional Computer Hardware Architecture    29

    4.1 Introduction    29
    4.2 A Conventional Computer System    29
    4.3 Network Interface Cards    30
    4.4 Definition Of A Bus    31
    4.5 The Bus Address Space    32
    4.6 The Fetch-Store Paradigm    33
    4.7 Network Interface Card Functionality    34
    4.8 NIC Optimizations For High Speed    34
    4.9 Onboard Address Recognition    35
    4.9.1 Unicast And Broadcast Recognition And Filtering    35
    4.9.2 Multicast Recognition And Filtering    35
    4.10 Onboard Packet Buffering    36
    4.11 Direct Memory Access    37
    4.12 Operation And Data Chaining    38
    4.13 Data Flow Diagram    39
    4.14 Promiscuous Mode    39
    4.15 Summary    40
    For Further Study    40

    Chapter 5   Basic Packet Processing: Algorithms And Data Structures    43

    5.1 Introduction    43
    5.2 State Information and Resource Exhaustion    43
    5.3 Packet Buffer Allocation    44
    5.4 Packet Buffer Size And Copying    45
    5.5 Protocol Layering And Copying    45
    5.6 Heterogeneity And Network Byte Order    46
    5.7 Bridge Algorithm    47
    5.8 Table Lookup And Hashing    49
    5.9 IP Datagram Fragmentation And Reassembly    50
    5.9.1 Interpretation Of The Flags Field    51
    5.9.2 Interpretation Of The Fragment Offset Field    51
    5.9.3 IP Fragmentation Algorithm    52
    5.9.4 Fragmenting A Fragment    53
    5.9.5 IP Reassembly    53
    5.9.6 Grouping Fragments Together    54
    5.9.7 Fragment Position    54
    5.9.8 IP Reassembly Algorithm    55
    5.10 IP Datagram Forwarding    56
    5.11 IP Forwarding Algorithm    57
    5.12 High-Speed IP Forwarding    57
    5.13 TCP Connection Recognition Algorithm    59
    5.14 TCP Splicing Algorithm    60
    5.15 Summary    63
    For Further Study    63
    Exercises    63

    Chapter 6   Packet Processing Functions    67

    6.1 Introduction    67
    6.2 Packet Processing    68
    6.3 Address Lookup And Packet Forwarding    68
    6.4 Error Detection And Correction    69
    6.5 Fragmentation, Segmentation, And Reassembly    70
    6.6 Frame And Protocol Demultiplexing    70
    6.7 Packet Classification    71
    6.7.1 Static And Dynamic Classification    71
    6.7.2 Demultiplexing Vs. Classification    71
    6.7.3 Optimized Packet Processing    72
    6.7.4 Classification Languages    72
    6.8 Queueing And Packet Discard    73
    6.8.1 Basic Queueing    73
    6.8.2 Priority Mechanisms    74
    6.8.3 Packet Discard    75
    6.9 Scheduling And Timing    75
    6.10 Security: Authentication And Privacy    76
    6.11 Traffic Measurement And Policing    76
    6.12 Per-Flow And Aggregate Policing    77
    6.13 Traffic Characteristics    78
    6.13.1 Constant Bit Rate (CBR)    78
    6.13.2 Variable Bit Rate (VBR)    78
    6.13.3 Available Bit Rate (ABR)    79
    6.13.4 Unspecified Bit Rate (UBR)    79
    6.14 Traffic Shaping    80
    6.15 The Point Of Traffic Management    81
    6.16 Timer Management    82
    6.17 Summary    83
    For Further Study    83
    Exercises    83

    Chapter 7   Protocol Software On A Conventional Processor    87

    7.1 Introduction    87
    7.2 Implementation Of Packet Processing In An Application    87
    7.3 Fast Packet Processing In Software    88
    7.4 Embedded Systems    88
    7.5 Operating System Implementations    89
    7.6 Software Interrupts And Priorities    89
    7.7 Multiple Priorities And Kernel Threads    91
    7.8 Thread Synchronization    92
    7.9 Software For Layered Protocols    92
    7.9.1 One Thread Per Layer    93
    7.9.2 One Thread Per Protocol    94
    7.9.3 Multiple Threads Per Protocol    94
    7.9.4 Separate Timer Management Threads    94
    7.9.5 One Thread Per Packet    95
    7.10 Asynchronous Vs. Synchronous Programming    96
    7.11 Summary    97
    For Further Study    97
    Exercises    97

    Chapter 8   Hardware Architectures For Protocol Processing    101

    8.1 Introduction    101
    8.2 Network Systems Architecture    101
    8.3 The Traditional Software Router    102
    8.4 Aggregate Data Rate    103
    8.5 Aggregate Packet Rate    103
    8.6 Packet Rate And Software Router Feasibility    105
    8.7 Overcoming The Single CPU Bottleneck    107
    8.8 Fine-Grain Parallelism    108
    8.9 Symmetric Coarse-Grain Parallelism    108
    8.10 Asymmetric Coarse-Grain Parallelism    109
    8.11 Special-Purpose Coprocessors    109
    8.12 ASIC Coprocessor Implementation    110
    8.13 NICs With Onboard Processing    111
    8.14 Smart NICs With Onboard Stacks    112
    8.15 Cells And Connection-Oriented Addressing    112
    8.16 Data Pipelines    113
    8.17 Summary    115
    For Further Study    115
    Exercises    115

    Chapter 9   Classification And Forwarding    119

    9.1 Introduction    119
    9.2 Inherent Limits Of Demultiplexing    119
    9.3 Packet Classification    120
    9.4 Software Implementation Of Classification    121
    9.5 Optimizing Software-Based Classification    122
    9.6 Software Classification On Special-Purpose Hardware    123
    9.7 Hardware Implementation Of Classification    123
    9.8 Optimized Classification Of Multiple Rule Sets    124
    9.9 Classification Of Variable-Size Headers    126
    9.10 Hybrid Hardware\|/\|Software Classification    127
    9.11 Dynamic Vs. Static Classification    128
    9.12 Fine-Grain Flow Creation    129
    9.13 Flow Forwarding In A Connection-Oriented Network    130
    9.14 Connectionless Network Classification And Forwarding    130
    9.15 Second Generation Network Systems    131
    9.16 Embedded Processors In Second Generation Systems    132
    9.17 Classification And Forwarding Chips    133
    9.18 Summary    134
    For Further Study    134
    Exercises    134

    Chapter 10   Switching Fabrics    137

    10.1 Introduction    137
    10.2 Bandwidth Of An Internal Fast Path    137
    10.3 The Switching Fabric Concept    138
    10.4 Synchronous And Asynchronous Fabrics    139
    10.5 A Taxonomy Of Switching Fabric Architectures    140
    10.6 Dedicated Internal Paths And Port Contention    140
    10.7 Crossbar Architecture    141
    10.8 Basic Queueing    143
    10.9 Time Division Solutions: Sharing Data Paths    145
    10.10 Shared Bus Architecture    145
    10.11 Other Shared Medium Architectures    146
    10.12 Shared Memory Architecture    147
    10.13 Multistage Fabrics    148
    10.14 Banyan Architecture    149
    10.15 Scaling A Banyan Switch    150
    10.16 An Example Commercial Technology    152
    10.17 Protocol Independence: The Agere PI40    153
    10.18 Implementation Of The Agere Fabric    153
    10.19 Capacity Of The Agere Fabric    155
    10.20 Other Commercial Technologies    155
    10.21 Practical Issues    155
    10.22 Summary    156
    For Further Study    156
    Exercises    156

    PART II    Network Processor Technology    159

    Chapter 11   Network Processors: Motivation And Purpose    161

    11.1 Introduction    161
    11.2 The CPU In A Second Generation Architecture    161
    11.3 Third Generation Network Systems    162
    11.4 The Motivation For Embedded Processors    163
    11.5 RISC vs. CISC    163
    11.6 The Need For Custom Silicon    164
    11.7 Definition Of A Network Processor    165
    11.8 A Fundamental Idea: Flexibility Through Programmability    166
    11.9 Instruction Set    167
    11.10 Scalability With Parallelism And Pipelining    167
    11.11 The Costs And Benefits Of Network Processors    168
    11.12 Network Processors And The Economics Of Success    169
    11.13 The Status And Future Of Network Processors    170
    11.14 Summary    170
    For Further Study    171
    Exercises    171

    Chapter 12   The Complexity Of Network Processor Design    173

    12.1 Introduction    173
    12.2 Network Processor Functionality    173
    12.3 Packet Processing Functions    174
    12.4 Ingress And Egress Processing    175
    12.4.1 Ingress Processing    175
    12.4.2 Egress Processing    176
    12.5 Parallel And Distributed Architecture    178
    12.6 The Architectural Roles Of Network Processors    179
    12.7 Consequences For Each Architectural Role    179
    12.8 Macroscopic Data Pipelining And Heterogeneity    181
    12.9 Network Processor Design And Software Emulation    181
    12.10 Summary    182
    For Further Study    182
    Exercises    183

    Chapter 13   Network Processor Architectures    185

    13.1 Introduction    185
    13.2 Architectural Variety    185
    13.3 Primary Architectural Characteristics    186
    13.3.1 Processor Hierarchy    186
    13.3.2 Memory Hierarchy    187
    13.3.3 Internal Transfer Mechanisms    189
    13.3.4 External Interface And Communication Mechanisms    190
    13.3.5 Special-Purpose Hardware    191
    13.3.6 Polling And Notification Mechanisms    191
    13.3.7 Concurrent Execution Support    192
    13.3.8 Hardware Support For Programming    193
    13.3.9 Hardware And Software Dispatch Mechanisms    193
    13.3.10 Implicit Or Explicit Parallelism    194
    13.4 Architecture, Packet Flow, And Clock Rates    194
    13.5 Software Architecture    197
    13.6 Assigning Functionality To The Processor Hierarchy    197
    13.7 Summary    199
    For Further Study    200
    Exercises    200

    Chapter 14   Issues In Scaling A Network Processor    203

    14.1 Introduction    203
    14.2 The Processing Hierarchy And Scaling    203
    14.3 Scaling By Making Processors Faster    204
    14.4 Scaling By Increasing The Number of Processors    204
    14.5 Scaling By Increasing Processor Types    205
    14.6 Scaling A Memory Hierarchy    206
    14.7 Scaling By Increasing Memory Size    208
    14.8 Scaling By Increasing Memory Bandwidth    208
    14.9 Scaling By Increasing Types Of Memory    209
    14.10 Scaling By Adding Memory Caches    210
    14.11 Scaling With Content Addressable Memory    211
    14.12 Using CAM for Packet Classification    213
    14.13 Other Limitations On Scale    215
    14.14 Software Scalability    216
    14.15 Bottlenecks And Scale    217
    14.16 Summary    217
    For Further Study    218
    Exercises    218

    Chapter 15   Examples Of Commercial Network Processors    221

    15.1 Introduction    221
    15.2 An Explosion Of Commercial Products    221
    15.3 A Selection of Products    222
    15.4 Augmented RISC Processor (Alchemy)    222
    15.5 Embedded Processor Plus Coprocessors (AMCC)    223
    15.6 Pipeline Of Homogeneous Processors (Cisco)    225
    15.7 Pipeline Of Heterogeneous Processors (EZchip)    226
    15.8 Extensive And Diverse Processors (Hifn)    227
    15.9 Homogeneous Parallel Processors Plus Controller (Intel)    230
    15.10 Flexible RISC Plus Coprocessors (Motorola)    233
    15.11 Extremely Long Homogeneous Pipeline (Xelerated)    237
    15.12 Summary    238
    For Further Study    239
    Exercises    239

    Chapter 16   Design Tradeoffs And Consequences    241

    16.1 Introduction    241
    16.2 Low Development Cost Vs. Performance    241
    16.3 Programmability Vs. Processing Speed    242
    16.4 Performance: Packet Rate, Data Rate, And Bursts    242
    16.5 Speed Vs. Functionality    243
    16.6 Per-Interface Rate Vs. Aggregate Data Rate    243
    16.7 Network Processor Speed Vs. Bandwidth    243
    16.8 Coprocessor Design: Lookaside Vs. Flow-Through    244
    16.9 Pipelining: Uniform Vs. Synchronized    244
    16.10 Explicit Parallelism Vs. Cost And Programmability    244
    16.11 Parallelism: Scale Vs. Packet Ordering    245
    16.12 Parallelism: Speed Vs. Stateful Classification    245
    16.13 Memory: Speed Vs. Programmability    245
    16.14 I/O Performance Vs. Pin Count    246
    16.15 Programming Languages: A Three-Way Tradeoff    246
    16.16 Multithreading: Throughput Vs. Programmability    246
    16.17 Traffic Management Vs. Blind Forwarding At Low Cost    247
    16.18 Generality Vs. Specific Architectural Role    247
    16.19 Memory Type: Special-Purpose Vs. General-Purpose    247
    16.20 Backward Compatibility Vs. Architectural Advances    248
    16.21 Parallelism Vs. Pipelining    248
    16.22 Summary    249
    Exercises    249

    PART III    Example Network Processor    251

    Chapter 17   Overview Of The Agere Network Processor    253

    17.1 Introduction    253
    17.2 Agere Terminology    253
    17.3 Agere PayloadPlus (APP)    254
    17.4 Conceptual Pipeline    254
    17.5 Functional Block Characteristics    255
    17.6 First-Generation Agere Implementation    256
    17.7 Second Generation Agere Implementation (APP550)    257
    17.8 Basic APP550 Features    258
    17.9 External Connections    259
    17.9.1 Memory Interfaces    260
    17.9.2 Media Interfaces    260
    17.9.3 Switching Fabric Interface    261
    17.9.4 PCI Bus Interface    261
    17.9.5 Scheduling Interface    261
    17.9.6 Coprocessor Interface    262
    17.10 Memory Uses    262
    17.11 Internal Architecture    263
    17.12 Engines On The APP550    264
    17.13 High-Speed Full Duplex Operation    264
    17.14 The Underlying Complexity    265
    17.15 Summary    265
    For Further Study    266
    Exercises    266

    Chapter 18   Functional Units On The Agere APP550    269

    18.1 Introduction    269
    18.2 Hierarchical Structure And Vestigial Terminology    269
    18.3 Major Functional Units On The APP550    270
    18.4 Input Interface    271
    18.5 Pattern Processing Engine (PPE)    271
    18.6 Data Flow Through The Classifier    273
    18.7 PDU Assembler    274
    18.8 Reorder Buffer    274
    18.9 State Engine    275
    18.10 Traffic Manager    276
    18.11 Stream Editor    278
    18.12 Programming, Performance, And Global Pulse Rate    279
    18.13 Other Functional Units On The APP550    280
    18.14 Summary    280
    For Further Study    281

    Chapter 19   Reference Platform And Simulator (HDS, SPA)    283

    19.1 Introduction    283
    19.2 Reference Systems    283
    19.3 Simulation Vs. Emulation    284
    19.4 The Agere Reference System    285
    19.5 Software Development Environment (SDE)    285
    19.6 System Performance Analyzer (SPA)    286
    19.7 Hardware Development System (HDS)    287
    19.8 HDS Bootstrap And Operation    288
    19.9 Run Time Environment (RTE)    288
    19.10 Testing Paradigms    289
    19.11 Summary    289
    For Further Study    290

    Chapter 20   Classification With A Pattern Matching Language (FPL)    293

    20.1 Introduction    293
    20.2 Support For High-Speed Classification    293
    20.3 Agere's Functional Programming Language (FPL)    294
    20.4 FPL Characteristics    294
    20.5 Two Pass Processing And The Division Into Blocks    296
    20.6 FPL Pass 1: The Root Program    296
    20.7 FPL Pass 2: The Replay Program    297
    20.8 Status Values Passed With A Packet    297
    20.9 Algorithms For First And Second Pass Processing    298
    20.10 Designating The First And Second Pass    299
    20.11 Pattern Matching Paradigm    300
    20.12 Using Patterns For Conditionals    301
    20.13 Symbolic Constants And Include Files    303
    20.14 Example FPL Code For Second Pass Processing    303
    20.15 Sequential Pattern Matching Paradigm    304
    20.16 Tree Functions And The BITS Default    306
    20.17 Return Values    306
    20.18 Passing Information To The Traffic Manager Block    306
    20.19 Sticky Bits    307
    20.20 Access To Built-in And External Functions    307
    20.21 FPL Constant Syntax    308
    20.22 Dotted Decimal Patterns And Longest Prefix Match    308
    20.23 FPL Support For Dynamic Classification    309
    20.24 Ordered Virtual Functions    310
    20.25 Ordered Functions And State Access    310
    20.26 Debugging FPL    311
    20.27 Summary    311
    For Further Study    312
    Exercises    312

    Chapter 21   State Engine And Scripting Language (C-NP)    315

    21.1 Introduction    315
    21.2 State Engine Functionality    315
    21.3 External Host Interface    316
    21.4 State Engine Control And Status Registers    316
    21.5 State Engine Memory And Address Space    317
    21.6 Onboard Configuration Bus And External Connection    317
    21.7 ASI Functions And Access From FPL    318
    21.8 Policing Engine And Policing Scripts    318
    21.9 Return Values From Policing    319
    21.10 Binding A Script ID To A Specific Function    319
    21.11 FPL Prototype Declaration And Example    320
    21.12 Initialization Of The Policing Database    320
    21.13 Register Files And Optimized Access    321
    21.14 C-NP Language Overview    322
    21.14.1 Lexical Conventions    322
    21.14.2 Data Declarations    323
    21.14.3 Expressions    324
    21.14.4 Statements    325
    21.14.5 Preprocessor Directives    325
    21.14.6 Script Structure    326
    21.15 An Example Policing Script    326
    21.16 Summary    328
    For Further Study    328
    Exercises    329

    Chapter 22   Traffic Manager (TM)    331

    22.1 Introduction    331
    22.2 Policing, Queueing, And Scheduling Decisions    331
    22.3 Buffer Management    332
    22.4 Completion Of Flow Policing    333
    22.5 A Unified Packet Discard Algorithm (WRED)    334
    22.6 Implementation Of Weighted RED On The APP550    335
    22.7 Scheduling And Traffic Shaping    338
    22.8 Traffic Shaping And The Agere Definition Of Scheduler    338
    22.9 CBR Shaping    340
    22.10 VBR Shaping    340
    22.11 Bandwidth Allocation In A Packet Switching System    341
    22.12 The Two Forms Of Bandwidth Allocation    341
    22.13 Fixed Bandwidth Allocation    342
    22.14 Implementation Of Fixed Allocation    342
    22.15 Proportional Bandwidth Allocation    343
    22.16 An Example Of Proportional Allocation    343
    22.17 Implementation Of Proportional Bandwidth Allocation    345
    22.18 Smoothed Deficit Weighted Round Robin    345
    22.19 A Hierarchy Of Queues    346
    22.20 Traffic Management Mechanisms On The APP550    346
    22.21 Summary    348
    For Further Study    349

    Chapter 23   Host Interface And Control Functions    351

    23.1 Introduction    351
    23.2 Motivation For An External Processor    351
    23.3 Role Of An External Host Processor    352
    23.4 Physical Interconnection To An External Host    353
    23.5 Packet Exchange And The Concept Of Pseudo Interface    353
    23.6 An Application Program Interface For External Hosts    353
    23.7 Programming Paradigm And Handle Declarations    354
    23.8 Initialization Functions    355
    23.9 Examples Of Object Functions    356
    23.10 A Dynamic Classification Example    357
    23.11 An Example Of Slow Path Packet Transfer    359
    23.12 Summary    361
    For Further Study    362
    Exercises    362

    Chapter 24   An Example System    365

    24.1 Introduction    365
    24.2 An ISP Access Node    365
    24.3 Differentiated Services (DiffServ)    366
    24.4 Mapping SLA Requirements To DiffServ Classes    367
    24.5 Queues, Destination IDs, And Classification    369
    24.6 Policing, Coloring, And Flow IDs    374
    24.7 Example Code: Classification (FPL)    375
    24.8 Example Code: Policing Scripts    386
    24.9 Buffer Management And Packet Discard    388
    24.10 Example Code: Buffer Management And Discard (WRED)    389
    24.11 Scheduler    391
    24.12 Dynamic Scheduler    393
    24.13 Example Code: SDWRR Scheduler    393
    24.14 Packet Marking (Modification)    396
    24.15 Example Code: Packet Marking    396
    24.16 Example Code: Host Interface    397
    24.17 Summary    401
    For Further Study    402
    Exercises    402

    Appendix 1   Glossary Of Terms And Abbreviations    403

    Bibliography   451

    Index   455


If you have any questions or concerns about the site, please contact <bobd@saintjoe.edu>