Andrey Borzenkov [Sat, 15 Nov 2008 14:15:09 +0000 (17:15 +0300)]
orinoco: indicate it is using dBm in wireless_stats and spy
Since WE7 /proc/net/wireless checks whether level and noise are in dBm
and shows them accordingly. Indicate that we return signal and noice
levels in dBm.
Before:
Inter-| sta-| Quality | Discarded packets | Missed | WE
face | tus | link level noise | nwid crypt frag retry misc | beacon | 22
eth1: 0000 65. 219. 165. 0 0 148 41 0 0
After:
Inter-| sta-| Quality | Discarded packets | Missed | WE
face | tus | link level noise | nwid crypt frag retry misc | beacon | 22
eth1: 0000 65. -37. -91. 0 0 0 0 0 0
While at it, replace raw numbers with appropriate macro.
Signed-off-by: Andrey Borzenkov <arvidjaar@mail.ru> Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
mac80211: don't assume driver has been attached on registration
mac80211's ieee80211_register_hw() is often called within the
probe path so it cannot assume the device's driver structure
has been attached yet so to create a workqueue instead of
using driver->name use the wiphy's phy%d name. The name doesn't
really matter anyway.
This should fix sporadic oopses found when we race to beat the
driver pointer setting. Not even sure how this was working properly.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
p54: honour bss_info_changed's basic_rates and other settings
As was pointed out in "p54: honour bss_info_changed's short slot time settings",
bss_info_changed provides more useful settings that can be used by the driver.
Signed-off-by: Christian Lamparter <chunkeey@web.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sujith [Fri, 14 Nov 2008 10:57:53 +0000 (16:27 +0530)]
mac80211: Use the HT capabilities from the IE instead of the station's caps.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Make the DMAable mameory consistent with pci_set_consistent_dma_mask().
The DMA-mapping.txt Documentation recommends this but for PCI-X
considerations and on strange architecture like SGI SN2, not sure
why it would fix an issue but lets see if it does, just in case.
Before this, this driver was tested with x86_64 with about
7 GB of RAM, not sure if this is really needed.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ivo van Doorn [Thu, 13 Nov 2008 22:07:33 +0000 (23:07 +0100)]
rt2x00: Detect USB BULK in/out endpoints
Instead of hardcoding the used in/out endpoints
we should detect them by walking through all
available endpoints.
rt2800usb will gain the most out of this, because
the legacy drivers indicate that there are multiple
endpoints available.
However this code might benefit at least rt73usb as
well for the MIMO queues, and if we are really lucky
rt2500usb will benefit because for the TX and PRIO
queues.
Even if rt2500usb and rt73usb do not get better performance
after this patch, the endpoint detection still belongs to
rt2x00usb, and it shouldn't hurt to always try to detect
the available endpoints.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Larry Finger [Thu, 13 Nov 2008 18:30:41 +0000 (12:30 -0600)]
rtl8187: Remove module warning and dependence on CONFIG_EXPERIMENTAL
After considerable testing, the initial fears that the driver might damage
some flavors of RTL8187B hardware seem to be groundless. Accordingly, the
logged warning is removed. In addition, Kconfig is changed to remove the
dependence on EXPERIMENTAL.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger> Acked-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
rtl8187: feedback transmitted packets using tx close descriptor for 8187B
Realtek 8187B has a receive command queue to feedback beacon interrupt
and transmitted packet status. Use it to feedback mac80211 about status
of transmitted packets. Unfortunately in the course of testing I found
that the sequence number reported by hardware includes entire sequence
control in a 12 bit only field, so a workaround is done to check only
lowest bits.
Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br> Signed-off-by: John W. Linville <linville@tuxdriver.com>
rtl8187: implement conf_tx callback to configure tx queues
Add conf_tx callback and use it to configure tx queues of 8187L/8187B.
Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Hin-Tak Leung reported that after the change "rtl8187: add short slot
handling for 8187B" his RTL8187B started to give low throughput on
network transfers. Turns out that the SIFS setting used isn't ok, it
doesn't look to be the real aSIFSTime, using the "magical" 0x22 value
like on other 818x variants as the vendor does too fixes the issue.
Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br> Signed-off-by: John W. Linville <linville@tuxdriver.com>
ath9k: Race condition in accessing TX and RX buffers.
Race condition causes RX buffers to be accessed even before it is
initialized. The RX and TX buffers are initialized immediately after
the hardware is registered with mac80211. The mac80211 start callback
is ready to be fired once the device is registered for a case when the
wpa_supplicant is also running at the same time.
The same race condition is also possible for RKFILL registration
as RFKILL init happens after the device registration with mac80211
and it is possible that rfkill_register would be called even before
it is initialized.
Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
ath9k: Build RFKILL feature even when RFKILL subsystem is a MODULE
Currently, ath9k builds RFKILL feature only when the RFKILL subsystem
is built part of the kernel. Build RFKILL feature regardless of whether
RFKILL subsystem is built as a MODULE or part of the kernel.
Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
This enables the custom firmware regulatory solution option
on iwlwifi drivers. These devices are uncapable of mapping their
EEPROM regulatory domain to a specific ISO / IEC alpha2.
Although the new 11n devices (>= iwl 5000) have only
3 regultaory SKUs -- MOW, ABG (no N) and BG -- the older
devices (3945 and 4965) have a more complex SKU arrangement
and therefore its not practical to move this to the driver.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
cfg80211: add support for custom firmware regulatory solutions
This adds API to cfg80211 to allow wireless drivers to inform
us if their firmware can handle regulatory considerations *and*
they cannot map these regulatory domains to an ISO / IEC 3166
alpha2. In these cases we skip the first regulatory hint instead
of expecting the driver to build their own regulatory structure,
providing us with an alpha2, or using the reg_notifier().
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds country IE parsing to mac80211 and enables its usage
within the new regulatory infrastructure in cfg80211. We parse
the country IEs only on management beacons for the BSSID you are
associated to and disregard the IEs when the country and environment
(indoor, outdoor, any) matches the already processed country IE.
To avoid following misinformed or outdated APs we build and use
a regulatory domain out of the intersection between what the AP
provides us on the country IE and what CRDA is aware is allowed
on the same country.
A secondary device is allowed to follow only the same country IE
as it make no sense for two devices on a system to be in two
different countries.
In the case the AP is using country IEs for an incorrect country
the user may help compliance further by setting the regulatory
domain before or after the IE is parsed and in that case another
intersection will be performed.
CONFIG_WIRELESS_OLD_REGULATORY is supported but requires CRDA
present.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
cfg80211: mark regdomains with > NL80211_MAX_SUPP_REG_RULES invalid
Lets remain consistent and mark rds with > NL80211_MAX_SUPP_REG_RULES
number of reg rules as invalid in is_valid_rd().
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
cfg80211: call_crda() won't tell us if CRDA was present
kobject_uevent_env() can return an error but it just tells us
if the uvent was built/sent or not, it doesn't tell us anything
about what happened in userspace, whether the udev rule was present
nor does it tell us if CRDA was present or not. So remove
the informative complaint about it assuming it will tell us
such things.
Note that you can determine if CRDA is present after loading cfg80211
by using:
is_old_static_regdom(cfg80211_regdomain)
but this doesn't account for possible user install after initial
boot, and also for when the user uses the static EU regulatory
domain.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
cfg80211: expect different rd in cfg80211 when intersecting
When intersecting it is possible that set_regdom() was called
with a regulatory domain which we'll only use as an aid to
build a final regulatory domain.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
cfg80211: separate intersection section in __set_regdom()
So far the __set_regdom() code is pretty generic as the
intersection case is fairly straight forward; this will however
change when 802.11d support is added so lets separate intersection
code for now in preparation for 802.11d support.
This patch only has slight functional changes.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
We have control over the REGDOM_SET_BY_* macros passed
so remove the switch.
This patch has no functional changes.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
We have complete control over REGDOM_SET_BY_* enum passed
down to __regulatory_hint() as such there is no need to
account for unexpected REGDOM_SET_BY_*'s, lets just remove
the switch statement as this code does not change and
won't change even when we add 802.11d support.
This patch has no functional changes.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Regulatory rules with negative frequencies are now
marked as invalid in is_valid_reg_rule().
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Tomas Winkler [Wed, 12 Nov 2008 21:14:11 +0000 (13:14 -0800)]
iwlwifi: iwl-fh.h cleanup
This patch fix value of upper FH register bound plus
it reorders and groups registers in more readable way
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Acked-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Zhu, Yi [Wed, 12 Nov 2008 21:14:10 +0000 (13:14 -0800)]
iwlwifi: some fh document fix and cleanup
This patch cleans up some flow handler related document. It also
removes some blank lines.
Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Zhu, Yi [Wed, 12 Nov 2008 21:14:09 +0000 (13:14 -0800)]
iwlwifi: configure_filter rewrite
The patch rewrites the mac80211 configure_filter handler to better mapping
mac80211 filter flags to iwlwifi hardware filter flags. We now can support
5 mac80211 filter flags: FIF_OTHER_BSS, FIF_ALLMULTI, FIF_PROMISC_IN_BSS,
FIF_BCN_PRBRESP_PROMISC and FIF_CONTROL. This patch also avoids reconnecting
if the filter flags are changed when the STA is associated. Because rx_assoc
is used when full rxon is not necessary.
Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Chatre, Reinette [Wed, 12 Nov 2008 21:14:07 +0000 (13:14 -0800)]
iwlwifi: replace magic constants with define
use IWL_CCK_RATES_MASK and IWL_OFDM_RATES_MASK instead of
their values directly.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Tomas Winkler [Wed, 12 Nov 2008 21:14:06 +0000 (13:14 -0800)]
iwlwifi: rs: remove fc variable and other cleanups
This patch
1. Removes use once use only fc variables, they are useless after refactoring
ieee80211 frame control handlers
2. Other trivial cleanups
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Tomas Winkler [Wed, 12 Nov 2008 21:14:05 +0000 (13:14 -0800)]
iwlwifi: consolidate station management code
This patch moves code around and group most of the station
management code into iwl-sta.c
No functional changes (yet)
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ivo van Doorn [Tue, 11 Nov 2008 23:01:37 +0000 (00:01 +0100)]
rt2x00: Fix TX failure path
The callback function write_tx_data() can only fail
when our ENTRY_OWNER_DEVICE_DATA flag on a queue entry
failed to determine the entry was not available and
it is in fact still owned by the hardware.
This means that if that function fails the queue
must be stopped in mac80211.
When rt2x00queue_get_queue() returns NULL in the TX
path, it means mac80211 has passed us an invalid queue,
although this should be impossible, it shouldn't hurt
if we send mac80211 a signal to stop the queue either.
Both issues can simply be resolved by removing their
manual failure handler and making them use the failure path
provided in rt2x00mac_tx().
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ivo van Doorn [Mon, 10 Nov 2008 18:42:18 +0000 (19:42 +0100)]
rt2x00: Move rt73usb register access wrappers into rt2x00usb
rt2500usb and rt73usb have different register word sizes,
for that reason the register access wrappers were never
moved into rt2x00usb.
With rt2800usb on its way, we should favor the 32bit
register access and move those wrappers into rt2x00usb.
That saves duplicate code, since only rt2500usb will
need the special 16bit wrappers.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ivo van Doorn [Mon, 10 Nov 2008 18:41:40 +0000 (19:41 +0100)]
rt2x00: Cleanup indirect register access
All code which accessed indirect registers was similar
in respect to the for-loop, the given timeout, etc.
Move it into a seperate function, which for PCI drivers
can be moved into rt2x00pci.
This allows us to cleanup the cleanup the code further
by removing the goto statementsand making the codepath
look a bit nicer.
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Mon, 10 Nov 2008 17:56:59 +0000 (18:56 +0100)]
ath5k: name pci driver "ath5k" too
Call the ath5k pci driver struct "ath5k" too to be less
confusing in sysfs.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Nick Kossifidis <mickflemm@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Alexander Duyck [Tue, 25 Nov 2008 09:04:03 +0000 (01:04 -0800)]
igb: loopback bits not correctly cleared from RCTL register
This change forces the bits to 0 by using an &= operation with an inverted
mask of all options instead of using an |= with a value of 0.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 25 Nov 2008 09:03:26 +0000 (01:03 -0800)]
igb: remove unneeded bit refrence when enabling jumbo frames
There is a reference to a Buffer Size extention bit that is unneded by
82575/82576 hardware. Since it is not needed it should be removed from the
code.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher [Tue, 25 Nov 2008 09:02:08 +0000 (01:02 -0800)]
DCB: fix kconfig option
Since the netlink option for DCB is necessary to actually be useful,
simplified the Kconfig option. In addition, added useful help text for the
Kconfig option.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Made usb_drivers reset_resume function point to hso_resume this
fixes problems a usb reset is done when the network interface
is left idle for a few minutes. Possibly reset_resume should
initialise hardware more but this works in the common case.
Signed-off-by: Denis Joseph Barrow <D.Barow@option.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Makes TIOCM ioctls for Data Carrier Detect & related functions
work like /drivers/serial/serial-core.c potentially needed
for pppd & similar user programs.
Signed-off-by: Denis Joseph Barrow <D.Barow@option.com> Signed-off-by: David S. Miller <davem@davemloft.net>
A new structure hso_mutex_table had to be declared statically
& used as as hso_device mutex_lock(&serial->parent->mutex) etc
is freed in hso_serial_open & hso_serial_close by kref_put while
the mutex is still in use.
This is a substantial change but should make the driver much stabler.
Signed-off-by: Denis Joseph Barrow <D.Barow@option.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Added check for IFF_UP in hso_resume, this should eliminate -EINVAL (-22)
errors caused from urb's being submitted twice, once by hso_resume
& once in hso_net_open, if suspend/resume USB power saving mode is enabled
Signed-off-by: Denis Joseph Barrow <D.Barow@option.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Moved serial_open_count in hso_serial_open to
prevent crashes owing to the serial structure being made NULL
when hso_serial_close is called even though hso_serial_open
returned -ENODEV, Alan Cox pointed out this happens,
also put in sanity check in hso_serial_close
to check for a valid serial structure which should prevent
the most reproducable crash in the driver when the hso device
is disconnected while in use.
Signed-off-by: Denis Joseph Barrow <D.Barow@option.com> Signed-off-by: David S. Miller <davem@davemloft.net>
As a concession to vendors who have to deal with one source for different
kernel versions, add a HAVE_NET_DEVICE_OPS so they don't end up hard
coding ifdef against kernel version.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Tue, 25 Nov 2008 05:30:21 +0000 (21:30 -0800)]
tcp: handle shift/merge of cloned skbs too
This caused me to get repeatably:
tcpdump: pcap_loop: recvfrom: Bad address
Happens occassionally when I tcpdump my for-looped test xfers:
while [ : ]; do echo -n "$(date '+%s.%N') "; ./sendfile; sleep 20; done
Rest of the relevant commands:
ethtool -K eth0 tso off
tc qdisc add dev eth0 root netem drop 4%
tcpdump -n -s0 -i eth0 -w sacklog.all
Running net-next under kvm, connection goes to the same host
(basically just out of kvm). The connection itself works ok
and data gets sent without corruption even with a large
number of tests while tcpdump fails usually within less than
5 tests.
Whether it only happens because of this change or not, I
don't know for sure but it's the only thing with which
I've seen that error. The non-cloned variant works w/o it
for much longer time. I'm yet to debug where the error
actually comes from.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Tue, 25 Nov 2008 05:26:56 +0000 (21:26 -0800)]
tcp: Make shifting not clear the hints
The earlier version was just very basic one which is "playing
safe" by always clearing the hints. However, clearing of a hint
is extremely costly operation with large windows, so it must be
avoided at all cost whenever possible, there is a way with
shifting too achieve not-clearing.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Tue, 25 Nov 2008 05:20:15 +0000 (21:20 -0800)]
tcp: Try to restore large SKBs while SACK processing
During SACK processing, most of the benefits of TSO are eaten by
the SACK blocks that one-by-one fragment SKBs to MSS sized chunks.
Then we're in problems when cleanup work for them has to be done
when a large cumulative ACK comes. Try to return back to pre-split
state already while more and more SACK info gets discovered by
combining newly discovered SACK areas with the previous skb if
that's SACKed as well.
This approach has a number of benefits:
1) The processing overhead is spread more equally over the RTT
2) Write queue has less skbs to process (affect everything
which has to walk in the queue past the sacked areas)
3) Write queue is consistent whole the time, so no other parts
of TCP has to be aware of this (this was not the case with
some other approach that was, well, quite intrusive all
around).
4) Clean_rtx_queue can release most of the pages using single
put_page instead of previous PAGE_SIZE/mss+1 calls
In case a hole is fully filled by the new SACK block, we attempt
to combine the next skb too which allows construction of skbs
that are even larger than what tso split them to and it handles
hole per on every nth patterns that often occur during slow start
overshoot pretty nicely. Though this to be really useful also
a retransmission would have to get lost since cumulative ACKs
advance one hole at a time in the most typical case.
TODO: handle upwards only merging. That should be rather easy
when segment is fully sacked but I'm leaving that as future
work item (it won't make very large difference anyway since
this current approach already covers quite a lot of normal
cases).
I was earlier thinking of some sophisticated way of tracking
timestamps of the first and the last segment but later on
realized that it won't be that necessary at all to store the
timestamp of the last segment. The cases that can occur are
basically either:
1) ambiguous => no sensible measurement can be taken anyway
2) non-ambiguous is due to reordering => having the timestamp
of the last segment there is just skewing things more off
than does some good since the ack got triggered by one of
the holes (besides some substle issues that would make
determining right hole/skb even harder problem). Anyway,
it has nothing to do with this change then.
I choose to route some abnormal looking cases with goto noop,
some could be handled differently (eg., by stopping the
walking at that skb but again). In general, they either
shouldn't happen at all or are rare enough to make no difference
in practice.
In theory this change (as whole) could cause some macroscale
regression (global) because of cache misses that are taken over
the round-trip time but it gets very likely better because of much
less (local) cache misses per other write queue walkers and the
big recovery clearing cumulative ack.
Worth to note that these benefits would be very easy to get also
without TSO/GSO being on as long as the data is in pages so that
we can merge them. Currently I won't let that happen because
DSACK splitting at fragment that would mess up pcounts due to
sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets
avoided, we have some conditions that can be made less strict.
TODO: I will probably have to convert the excessive pointer
passing to struct sacktag_state... :-)
My testing revealed that considerable amount of skbs couldn't
be shifted because they were cloned (most likely still awaiting
tx reclaim)...
[The rest is considering future work instead since I got
repeatably EFAULT to tcpdump's recvfrom when I added
pskb_expand_head to deal with clones, so I separated that
into another, later patch]
...To counter that, I gave up on the fifth advantage:
5) When growing previous SACK block, less allocs for new skbs
are done, basically a new alloc is needed only when new hole
is detected and when the previous skb runs out of frags space
...which now only happens of if reclaim is fast enough to dispose
the clone before the SACK block comes in (the window is RTT long),
otherwise we'll have to alloc some.
With clones being handled I got these numbers (will be somewhat
worse without that), taken with fine-grained mibs:
Ilpo Järvinen [Tue, 25 Nov 2008 05:14:43 +0000 (21:14 -0800)]
tcp: make tcp_sacktag_one able to handle partial skb too
This is preparatory work for SACK combiner patch which may
have to count TCP state changes for only a part of the skb
because it will intentionally avoids splitting skb to SACKed
and not sacked parts.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Tue, 25 Nov 2008 05:12:28 +0000 (21:12 -0800)]
tcp: more aggressive skipping
I knew already when rewriting the sacktag that this condition
was too conservative, change it now since it prevent lot of
useless work (especially in the sack shifter decision code
that is being added by a later patch). This shouldn't change
anything really, just save some processing regardless of the
shifter.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Tue, 25 Nov 2008 05:03:43 +0000 (21:03 -0800)]
tcp: collapse more than two on retransmission
I always had thought that collapsing up to two at a time was
intentional decision to avoid excessive processing if 1 byte
sized skbs are to be combined for a full mtu, and consecutive
retransmissions would make the size of the retransmittee
double each round anyway, but some recent discussion made me
to understand that was not the case. Thus make collapse work
more and wait less.
It would be possible to take advantage of the shifting
machinery (added in the later patch) in the case of paged
data but that can be implemented on top of this change.
tcp_skb_is_last check is now provided by the loop.
I tested a bit (ss-after-idle-off, fill 4096x4096B xfer,
10s sleep + 4096 x 1byte writes while dropping them for
some a while with netem):
Eric Dumazet [Tue, 25 Nov 2008 00:07:50 +0000 (16:07 -0800)]
net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames()
We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.
This patch makes ip_push_pending_frames() steal the refcount its
callers had to take when filling inet->cork.dst.
This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 24 Nov 2008 23:52:46 +0000 (15:52 -0800)]
net: avoid a pair of dst_hold()/dst_release() in ip_append_data()
We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.
This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.
This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
gen_kill_estimator() linear lists lookups are very slow, and e.g. while
deleting a large number of HTB classes soft lockups were reported. Here
is another try to fix this problem: this time internally, with rbtree,
so similarly to Jamal's hashing idea IIRC. (Looking for next hits could
be still optimized, but it's really fast as it is.)
Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 24 Nov 2008 23:46:08 +0000 (15:46 -0800)]
pkt_sched: sch_drr: fix drr_dequeue loop()
Jarek Poplawski points out:
If all child qdiscs of sch_drr are non-work-conserving (e.g. sch_tbf)
drr_dequeue() will busy-loop waiting for skbs instead of leaving the
job for a watchdog. Checking for list_empty() in each loop isn't
necessary either, because this can never be true except the first time.
Using non-work-conserving qdiscs as children of DRR makes no sense,
simply bail out in that case.
Reported-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 24 Nov 2008 08:09:29 +0000 (00:09 -0800)]
net: Make sure BHs are disabled in sock_prot_inuse_add()
The rule of calling sock_prot_inuse_add() is that BHs must
be disabled. Some new calls were added where this was not
true and this tiggers warnings as reported by Ilpo.
Fix this by adding explicit BH disabling around those call sites,
or moving sock_prot_inuse_add() call inside an existing BH disabled
section.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 24 Nov 2008 07:24:32 +0000 (23:24 -0800)]
eth: Declare an optimized compare_ether_addr_64bits() function
Linus mentioned we could try to perform long word operations, even
on potentially unaligned addresses, on x86 at least. David mentioned
the HAVE_EFFICIENT_UNALIGNED_ACCESS test to handle this on all
arches that have efficient unailgned accesses.
I tried this idea and got nice assembly on 32 bits:
And very nice assembly on 64 bits of course (one xor, one shl)
Nice oprofile improvement in eth_type_trans(), 0.17 % instead of 0.41 %,
expected since we remove 8 instructions on a fast path.
This patch implements a compare_ether_addr_64bits() function, that
uses the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS ifdef to efficiently
perform the 6 bytes comparison on all capable arches.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Nov 2008 04:01:59 +0000 (20:01 -0800)]
axnet_cs: Fix build after net device ops ne2k conversion.
Commit 4e4fd4e485ad63a9074ff09a9b53ffc7a5c594ec ("ne2k: convert to
net_device_ops") exported some ei_* symbols from the 8390 library,
but the axnet_cs driver defines local static versions of the same
functions.
Rename them to avoid the namespace conflict.
Reported by Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Nov 2008 01:34:03 +0000 (17:34 -0800)]
net: Make sure BHs are disabled in sock_prot_inuse_add()
The rule of calling sock_prot_inuse_add() is that BHs must
be disabled. Some new calls were added where this was not
true and this tiggers warnings as reported by Ilpo.
Fix this by adding explicit BH disabling around those call sites.
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Mon, 24 Nov 2008 01:26:26 +0000 (17:26 -0800)]
net: fix tunnels in netns after ndo_ changes
dev_net_set() should be the very first thing after alloc_netdev().
"ndo_" changes turned simple assignment (which is OK to do before netns
assignment) into quite non-trivial operation (which is not OK, init_net was
used). This leads to incomplete initialisation of tunnel device in netns.
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c02efdb5>] ip6_tnl_exit_net+0x37/0x4f
*pde = 00000000
Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
last sysfs file: /sys/class/net/lo/operstate
Eric Dumazet [Mon, 24 Nov 2008 01:22:55 +0000 (17:22 -0800)]
net: Convert TCP/DCCP listening hash tables to use RCU
This is the last step to be able to perform full RCU lookups
in __inet_lookup() : After established/timewait tables, we
add RCU lookups to listening hash table.
The only trick here is that a socket of a given type (TCP ipv4,
TCP ipv6, ...) can now flight between two different tables
(established and listening) during a RCU grace period, so we
must use different 'nulls' end-of-chain values for two tables.
We define a large value :
#define LISTENING_NULLS_BASE (1U << 29)
So that slots in listening table are guaranteed to have different
end-of-chain values than slots in established table. A reader can
still detect it finished its lookup in the right chain.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 24 Nov 2008 00:10:23 +0000 (16:10 -0800)]
dccp: Header option insertion routine for feature-negotiation
The patch extends existing code:
* Confirm options divide into the confirmed value plus an optional preference
list for SP values. Previously only the preference list was echoed for SP
values, now the confirmed value is added as per RFC 4340, 6.1;
* length and sanity checks are added to avoid illegal memory (or NULL) access.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 24 Nov 2008 00:09:11 +0000 (16:09 -0800)]
dccp: Support for Mandatory options
Support for Mandatory options is provided by this patch, which will
be used by subsequent feature-negotiation patches.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 24 Nov 2008 00:07:53 +0000 (16:07 -0800)]
dccp: Increase the scope of variable-length htonl/ntohl functions
This extends the scope of two available functions,
encode|decode_value_var, to work up to 6 (8) bytes, to match maximum
requirements in the RFC.
These functions are going to be used both by general option processing
and feature negotiation code, hence declarations have been put into
feat.h.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 24 Nov 2008 00:04:59 +0000 (16:04 -0800)]
dccp: API to query the current TX/RX CCID
This provides function to query the current TX/RX CCID dynamically,
without reliance on the minisock value, using dynamic information
available in the currently loaded CCID module.
This query function is then used to
(a) provide the getsockopt part for getting/setting CCIDs via sockopts;
(b) replace the current test for "which CCID is in use" in probe.c.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Mon, 24 Nov 2008 00:02:31 +0000 (16:02 -0800)]
dccp: Set per-connection CCIDs via socket options
With this patch, TX/RX CCIDs can now be changed on a per-connection
basis, which overrides the defaults set by the global sysctl variables
for TX/RX CCIDs.
To make full use of this facility, the remaining patches of this patch
set are needed, which track dependencies and activate negotiated
feature values.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 23 Nov 2008 23:48:22 +0000 (15:48 -0800)]
net: af_netlink should update its inuse counter
In order to have relevant information for NETLINK protocol, in
/proc/net/protocols, we should use sock_prot_inuse_add() to
update a (percpu and pernamespace) counter of inuse sockets.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Sat, 22 Nov 2008 05:30:24 +0000 (21:30 -0800)]
igb: do not use phy ops in ethtool test cleanup for non-copper parts
Currently the igb driver is experiencing a panic due to a null function
pointer being used during the cleanup of the ethtool looback test on
fiber/serdes parts. This patch prevents that and adds a check prior to
calling any phy function.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Sat, 22 Nov 2008 05:28:18 +0000 (21:28 -0800)]
enic: driver/firmware API updates
Add driver/firmware compatibility check.
Update firmware notify cmd to honor notify area size.
Add new version of init cmd.
Add link_down_cnt to notify area to track link down count.
Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>