Linux Ethernet-Howto: Technical Information

8. Technical Information

For those who want to play with the present drivers, or try to make up their own driver for a card that is presently unsupported, this information should be useful. If you do not fall into this category, then perhaps you will want to skip this section.

8.1 Probed Addresses

While trying to determine what ethernet card is there, the following addresses are autoprobed, assuming the type and specs of the card have not been set in the kernel. The file names below are in /usr/src/linux/drivers/net/

        3c501.c         0x280, 0x300
        3c503.c:        0x300, 0x310, 0x330, 0x350, 0x250, 0x280, 0x2a0, 0x2e0
        3c505.c:        0x300, 0x280, 0x310
        3c507.c:        0x300, 0x320, 0x340, 0x280
        3c509.c:        Special ID Port probe
        apricot.c       0x300
        at1700.c:       0x300, 0x280, 0x380, 0x320, 0x340, 0x260, 0x2a0, 0x240
        atp.c:          0x378, 0x278, 0x3bc
        depca.c         0x300, 0x200
        de600.c:        0x378
        de620.c:        0x378
        eexpress.c:     0x300, 0x270, 0x320, 0x340
        hp.c:           0x300, 0x320, 0x340, 0x280, 0x2C0, 0x200, 0x240
        hp-plus.c       0x200, 0x240, 0x280, 0x2C0, 0x300, 0x320, 0x340
        lance.c:        0x300, 0x320, 0x340, 0x360
        ne.c:           0x300, 0x280, 0x320, 0x340, 0x360
        ni52.c          0x300, 0x280, 0x360, 0x320, 0x340
        ni65.c          0x300, 0x320, 0x340, 0x360
        smc-ultra.c:    0x200, 0x220, 0x240, 0x280, 0x300, 0x340, 0x380
        wd.c:           0x300, 0x280, 0x380, 0x240

There are some NE2000 clone ethercards out there that are waiting black holes for autoprobe drivers. While many NE2000 clones are safe until they are enabled, some can't be reset to a safe mode. These dangerous ethercards will hang any I/O access to their `dataports'. The typical dangerous locations are:

        Ethercard jumpered base     Dangerous locations (base + 0x10 - 0x1f)
                0x300 *                         0x310-0x317
                0x320                           0x330-0x337
                0x340                           0x350-0x357
                0x360                           0x370-0x377

* The 0x300 location is the traditional place to put an ethercard, but it's also a popular place to put other devices (often SCSI controllers). The 0x320 location is often the next one chosen, but that's bad for for the AHA1542 driver probe. The 0x360 location is bad, because it conflicts with the parallel port at 0x378. If you have two IDE controllers, or two floppy controlers, then 0x360 is also a bad choice, as a NE2000 card will clobber them as well.

Note that kernels > 1.1.7X keep a log of who uses which i/o ports, and will not let a driver use i/o ports registered by an earlier driver. This may result in probes silently failing. You can view who is using what i/o ports by typing cat /proc/ioports if you have the proc filesystem enabled.

To avoid these lurking ethercards, here are the things you can do:

Probe for the device's BIOS in memory space. This is easy and always safe, but it only works for cards that always have BIOSes, like primary SCSI controllers.
Avoid probing any of the above locations until you think you've located your device. The NE2000 clones have a reset range from <base>+0x18 to <base>+0x1f that will read as 0xff, so probe there first if possible. It's also safe to probe in the 8390 space at <base>+0x00 - <base>+0x0f, but that area will return quasi-random values
If you must probe in the dangerous range, for instance if your target device has only a few port locations, first check that there isn't an NE2000 there. You can see how to do this by looking at the probe code in /usr/src/linux/net/inet/ne.c
Use the `reserve' boot time argument to protect volatile areas from being probed. See the information on using boot time arguments with LILO in The reserve command

8.2 Writing a Driver

The only thing that one needs to use an ethernet card with Linux is the appropriate driver. For this, it is essential that the manufacturer will release the technical programming information to the general public without you (or anyone) having to sign your life away. A good guide for the likelihood of getting documentation (or, if you aren't writing code, the likelihood that someone else will write that driver you really, really need) is the availability of the Crynwr (nee Clarkson) packet driver. Russ Nelson runs this operation, and has been very helpful in supporting the development of drivers for Linux. Net-surfers can try this URL to look up Russ' software.

Russ Nelson's Packet Drivers

Given the documentation, you can write a driver for your card and use it for Linux (at least in theory). Keep in mind that some old hardware that was designed for XT type machines will not function very well in a multitasking environment such as Linux. Use of these will lead to major problems if your network sees a reasonable amount of traffic.

Most cards come with drivers for MS-DOS interfaces such as NDIS and ODI, but these are useless for Linux. Many people have suggested directly linking them in or automatic translation, but this is nearly impossible. The MS-DOS drivers expect to be in 16 bit mode and hook into `software interrupts', both incompatible with the Linux kernel. This incompatibility is actually a feature, as some Linux drivers are considerably better than their MS-DOS counterparts. The `8390' series drivers, for instance, use ping-pong transmit buffers, which are only now being introduced in the MS-DOS world.

(Ping-pong Tx buffers means using at least 2 max-size packet buffers for Tx packets. One is loaded while the card is transmitting the other. The second is then sent as soon as the first finished, and so on. In this way, most cards are able to continuously send back-to-back packets onto the wire.)

OK. So you have decided that you want to write a driver for the Foobar Ethernet card, as you have the programming information, and it hasn't been done yet. (...these are the two main requirements ;-) You should start with the skeleton network driver that is provided with the Linux kernel source tree. It can be found in the file /usr/src/linux/drivers/net/skeleton.c in all recent kernels. Also have a look at the Kernel Hackers Guide, at the following URL: KHG

8.3 Driver interface to the kernel

Here are some notes on the functions that you would have to write if creating a new driver. Reading this in conjunction with the above skeleton driver may help clear things up.

Probe

Called at boot to check for existence of card. Best if it can check un-obtrsively by reading from memory, etc. Can also read from i/o ports. Initial writing to i/o ports in a probe is not good as it may kill another device. Some device initialization is usually done here (allocating i/o space, IRQs,filling in the dev->??? fields etc.) You need to know what io ports/mem the card can be configured to, how to enable shared memory (if used) and how to select/enable interrupt generation, etc.

Interrupt handler

Called by the kernel when the card posts an interrupt. This has the job of determining why the card posted an interrupt, and acting accordingly. Usual interrupt conditions are data to be rec'd, transmit completed, error conditions being reported. You need to know any relevant interrupt status bits so that you can act accordingly.

Transmit function

Linked to dev->hard_start_xmit() and is called by the kernel when there is some data that the kernel wants to put out over the device. This puts the data onto the card and triggers the transmit. You need to know how to bundle the data and how to get it onto the card (shared memory copy, PIO transfer, DMA?) and in the right place on the card. Then you need to know how to tell the card to send the data down the wire, and (possibly) post an interrupt when done. When the hardware can't accept additional packets it should set the dev->tbusy flag. When additional room is available, usually during a transmit-complete interrupt, dev->tbusy should be cleared and the higher levels informed with mark_bh(INET_BH).

Receive function

Called by the kernel interrupt handler when the card reports that there is data on the card. It pulls the data off the card, packages it into a sk_buff and lets the kernel know the data is there for it by doing a netif_rx(sk_buff). You need to know how to enable interrupt generation upon Rx of data, how to check any relevant Rx status bits, and how to get that data off the card (again sh mem, PIO, DMA, etc.)

Open function

linked to dev->open and called by the networking layers when somebody does ifconfig eth0 up - this puts the device on line and enables it for Rx/Tx of data. Any special initialization incantations that were not done in the probe sequence (enabling IRQ generation, etc.) would go in here.

Close function (optional)

This puts the card in a sane state when someone does ifconfig eth0 down. It should free the IRQs and DMA channels if the hardware permits, and turn off anything that will save power (like the transceiver).

Miscellaneous functions

Things like a reset function, so that if things go south, the driver can try resetting the card as a last ditch effort. Usually done when a Tx times out or similar. Also a function to read the statistics registers of the card if so equipped.

8.4 Interrupts and Linux

There are two kinds of interrupt handlers in Linux: fast ones and slow ones. You decide what kind you are installing by the flags you pass to irqaction(). The fast ones, such as the serial interrupt handler, run with _all_ interrupts disabled. The normal interrupt handlers, such as the one for ethercard drivers, runs with other interrupts enabled.

There is a two-level interrupt structure. The `fast' part handles the device register, removes the packets, and perhaps sets a flag. After it is done, and interrupts are re-enabled, the slow part is run if the flag is set.

The flag between the two parts is set by:

mark_bh(INET_BH);

Usually this flag is set within dev_rint() during a received-packet interrupt, and set directly by the device driver during a transmit-complete interrupt.

You might wonder why all interrupt handlers cannot run in `normal mode' with other interrupts enabled. Ross Biro uses this scenario to illustrate the problem:

You get a serial interrupt, and start processing it. The serial interrupt is now masked.
You get a network interrupt, and you start transferring a maximum-sized 1500 byte packet from the card.
Another character comes in, but this time the interrupts are masked!

The `fast' interrupt structure solves this problem by allowing bounded-time interrupt handlers to run without the risk of leaving their interrupt lines masked by another interrupt request.

There is an additional distinction between fast and slow interrupt handlers -- the arguments passed to the handler. A `slow' handler is defined as


                static void
                handle_interrupt(int reg_ptr)
                {
                    int irq = -(((struct pt_regs *)reg_ptr)->orig_eax+2);
                    struct device *dev = irq2dev_map[irq];
                ...

While a fast handler gets the interrupt number directly


                static void
                handle_fast_interrupt(int irq)
                {
                ...

A final aspect of network performance is latency. The only board that really addresses this is the 3c509, which allows a predictive interrupt to be posted. It provides an interrupt response timer so that the driver can fine-tune how early an interrupt is generated.

8.5 Programming the Intel chips (i82586 and i82593)

These chips are used on a number of cards, namely the 3c507 ('86), the Intel EtherExpress 16 ('86), Microdyne's exos205t ('86), the Z-Note ('93), and the Racal-Interlan ni5210 ('86).

Russ Nelson writes: `Most boards based on the 82586 can reuse quite a bit of their code. More, in fact, than the 8390-based adapters. There are only three differences between them:

The code to get the Ethernet address,
The code to trigger CA on the 82586, and
The code to reset the 82586.

The Intel EtherExpress 16 is an exception, as it I/O maps the 82586. Yes, I/O maps it. Fairly clunky, but it works.

Garrett Wollman did an AT&T driver for BSD that uses the BSD copyright. The latest version I have (Sep '92) only uses a single transmit buffer. You can and should do better than this if you've got the memory. The AT&T and 3c507 adapters do; the ni5210 doesn't.

The people at Intel gave me a very big clue on how you queue up multiple transmit packets. You set up a list of NOP-> XMIT-> NOP-> XMIT-> NOP-> XMIT-> beginning) blocks, then you set the `next' pointer of all the NOP blocks to themselves. Now you start the command unit on this chain. It continually processes the first NOP block. To transmit a packet, you stuff it into the next transmit block, then point the NOP to it. To transmit the next packet, you stuff the next transmit block and point the previous NOP to it. In this way, you don't have to wait for the previous transmit to finish, you can queue up multiple packets without any ambiguity as to whether it got accepted, and you can avoid the command unit start-up delay.'

8.6 Technical information from 3Com

If you are interested in working on drivers for 3Com cards, you can get technical documentation from 3Com. Cameron has been kind enough to tell us how to go about it below:

3Com's Ethernet Adapters are documented for driver writers in our `Technical References' (TRs). These manuals describe the programmer interfaces to the boards but they don't talk about the diagnostics, installation programs, etc that end users can see.

The Network Adapter Division marketing department has the TRs to give away. To keep this program efficient, we centralized it in a thing called `CardFacts.' CardFacts is an automated phone system. You call it with a touch-tone phone and it faxes you stuff. To get a TR, call CardFacts at 408-727-7021. Ask it for Developer's Order Form, document number 9070. Have your fax number ready when you call. Fill out the order form and fax it to 408-764-5004. Manuals are shipped by Federal Express 2nd Day Service.

After you get a manual, if you still can't figure out how to program the board, try our `CardBoard' BBS at 1-800-876-3266, and if you can't do that, write Andy_Chan@3Mail.3com.com and ask him for alternatives. If you have a real stumper that nobody has figured out yet, the fellow who needs to know about it is Steve_Lebus@3Mail.3com.com.

There are people here who think we are too free with the manuals, and they are looking for evidence that the system is too expensive, or takes too much time and effort. That's why it's important to try to use CardFacts before you start calling and mailing the people I named here.

There are even people who think we should be like Diamond and Xircom, requiring tight `partnership' with driver writers to prevent poorly performing drivers from getting written. So far, 3Com customers have been really good about this, and there's no problem with the level of requests we've been getting. We need your continued cooperation and restraint to keep it that way.

        Cameron Spitzer, 408-764-6339
        3Com NAD
        Santa Clara
        work: camerons@nad.3com.com
        home: cls@truffula.sj.ca.us

8.7 Notes on AMD PCnet / LANCE Based cards

The AMD LANCE (Local Area Network Controller for Ethernet) was the original offering, and has since been replaced by the `PCnet-ISA' chip, otherwise known as the 79C960. A relatively new chip from AMD, the 79C960, is the heart of many new cards being released at present. Note that the name `LANCE' has stuck, and some people will refer to the new chip by the old name. Dave Roberts of the Network Products Division of AMD was kind enough to contribute the following information regarding this chip:

`As for the architecture itself, AMD developed it originally and reduced it to a single chip -- the PCnet(tm)-ISA -- over a year ago. It's been selling like hotcakes ever since.

Functionally, it is equivalent to a NE1500. The register set is identical to the old LANCE with the 1500/2100 architecture additions. Older 1500/2100 drivers will work on the PCnet-ISA. The NE1500 and NE2100 architecture is basically the same. Initially Novell called it the 2100, but then tried to distinguish between coax and 10BASE-T cards. Anything that was 10BASE-T only was to be numbered in the 1500 range. That's the only difference.

Many companies offer PCnet-ISA based products, including HP, Racal-Datacom, Allied Telesis, Boca Research, Kingston Technology, etc. The cards are basically the same except that some manufacturers have added `jumperless' features that allow the card to be configured in software. Most have not. AMD offers a standard design package for a card that uses the PCnet-ISA and many manufacturers use our design without change. What this means is that anybody who wants to write drivers for most PCnet-ISA based cards can just get the data-sheet from AMD. Call our literature distribution center at (800)222-9323 and ask for the Am79C960, PCnet-ISA data sheet. It's free.

A quick way to understand whether the card is a `stock' card is to just look at it. If it's stock, it should just have one large chip on it, a crystal, a small IEEE address PROM, possibly a socket for a boot ROM, and a connector (1, 2, or 3, depending on the media options offered). Note that if it's a coax card, it will have some transceiver stuff built onto it as well, but that should be near the connector and away from the PCnet-ISA.'

There is also some info regarding the LANCE chip in the file lance.c which is included in the standard kernel.

A note to would-be card hackers is that different LANCE implementations do `restart' in different ways. Some pick up where they left off in the ring, and others start right from the beginning of the ring, as if just initialised. This is a concern when setting the multicast list.

8.8 Multicast and Promiscuous Mode

Another one of the things Donald has worked on is implementing multicast and promiscuous mode hooks. All of the released (i.e. not ALPHA) ISA drivers now support promiscuous mode.

Donald writes: `At first I was planning to do it while implementing either the /dev/* or DDI interface, but that's not really the correct way to do it. We should only enable multicast or promiscuous modes when something wants to look at the packets, and shut it down when that application is finished, neither of which is strongly related to when the hardware is opened or released.

I'll start by discussing promiscuous mode, which is conceptually easy to implement. For most hardware you only have to set a register bit, and from then on you get every packet on the wire. Well, it's almost that easy; for some hardware you have to shut the board (potentially dropping a few packet), reconfigure it, and then re-enable the ethercard. This is grungy and risky, but the alternative seems to be to have every application register before you open the ethercard at boot-time.

OK, so that's easy, so I'll move on something that's not quite so obvious: Multicast. It can be done two ways:

Use promiscuous mode, and a packet filter like the Berkeley packet filter (BPF). The BPF is a pattern matching stack language, where you write a program that picks out the addresses you are interested in. Its advantage is that it's very general and programmable. Its disadvantage is that there is no general way for the kernel to avoid turning on promiscuous mode and running every packet on the wire through every registered packet filter. See The Berkeley Packet Filter for more info.
Using the built-in multicast filter that most etherchips have.

I guess I should list what a few ethercards/chips provide:

        
        Chip/card  Promiscuous  Multicast filter
        ----------------------------------------
        Seeq8001/3c501  Yes     Binary filter (1)
        3Com/3c509      Yes     Binary filter (1)
        8390            Yes     Autodin II six bit hash (2) (3)
        LANCE           Yes     Autodin II six bit hash (2) (3)
        i82586          Yes     Hidden Autodin II six bit hash (2) (4)

These cards claim to have a filter, but it's a simple yes/no `accept all multicast packets', or `accept no multicast packets'.
AUTODIN II is the standard ethernet CRC (checksum) polynomial. In this scheme multicast addresses are hashed and looked up in a hash table. If the corresponding bit is enabled, this packet is accepted. Ethernet packets are laid out so that the hardware to do this is trivial -- you just latch six (usually) bits from the CRC circuit (needed anyway for error checking) after the first six octets (the destination address), and use them as an index into the hash table (six bits -- a 64-bit table).
These chips use the six bit hash, and must have the table computed and loaded by the host. This means the kernel must include the CRC code.
The 82586 uses the six bit hash internally, but it computes the hash table itself from a list of multicast addresses to accept.

Note that none of these chips do perfect filtering, and we still need a middle-level module to do the final filtering. Also note that in every case we must keep a complete list of accepted multicast addresses to recompute the hash table when it changes.

My first pass at device-level support is detailed in the outline driver skeleton.c

It looks like the following:

        #ifdef HAVE_MULTICAST
        static void set_multicast_list(struct device *dev, int num_addrs,
                         void *addrs);
        #endif
        .
        .
        
        ethercard_open() {
        ...
        #ifdef HAVE_MULTICAST
                dev->set_multicast_list = &set_multicast_list;
        #endif
        ...
        
        #ifdef HAVE_MULTICAST
        /* Set or clear the multicast filter for this adaptor.
           num_addrs -- -1      Promiscuous mode, receive all packets
           num_addrs -- 0       Normal mode, clear multicast list
           num_addrs > 0        Multicast mode, receive normal and
                MC packets, and do best-effort filtering.
         */
        static void
        set_multicast_list(struct device *dev, int num_addrs, void *addrs)
        {
        ...

Any comments, criticism, etc. are welcome.'

8.9 The Berkeley Packet Filter (BPF)

The general idea of the developers is that the BPF functionality should not be provided by the kernel, but should be in a (hopefully little-used) compatibility library.

For those not in the know: BPF (the Berkeley Packet Filter) is an mechanism for specifying to the kernel networking layers what packets you are interested in. It's implemented as a specialized stack language interpreter built into a low level of the networking code. An application passes a program written in this language to the kernel, and the kernel runs the program on each incoming packet. If the kernel has multiple BPF applications, each program is run on each packet.

The problem is that it's difficult to deduce what kind of packets the application is really interested in from the packet filter program, so the general solution is to always run the filter. Imagine a program that registers a BPF program to pick up a low data-rate stream sent to a multicast address. Most ethernet cards have a hardware multicast address filter implemented as a 64 entry hash table that ignores most unwanted multicast packets, so the capability exists to make this a very inexpensive operation. But with the BFP the kernel must switch the interface to promiscuous mode, receive _all_ packets, and run them through this filter. This is work, BTW, that's very difficult to account back to the process requesting the packets.