Samba

Planet Samba

Here you will find the personal blogs of Samba developers (for those that keep them). More information about members can also be found on the Samba Team page.

July 24, 2015

David

Rapid Linux Kernel Dev/Test with QEMU, KVM and Dracut

Inspired by Stefan Hajnoczi's excellent blog post, I recently set about constructing an environment for rapid testing of Linux kernel changes, particularly focused on the LIO iSCSI target. Such an environment would help me in number of ways:

  • Faster dev / test turnaround.
    • A modified kernel can be compiled and booted in a matter of seconds.
  • Improved resource utilisation.
    • No need to boot external test hosts or heavyweight VMs.
  •  Simplified and speedier debugging.

My requirements were slightly different to Stefan's, in that:
  • I'd prefer to be lazy and use Dracut for initramfs generation.
  • I need a working network connection between VM and hypervisor system
    • The VM will act as the iSCSI target, the hypervisor as the initiator.

Starting with the Linux kernel, the first step is to build a bzimage:
~/> git clone \
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
hack, hack, hack.
~/linux/> make menuconfig
Set CONFIG_IP_PNP_DHCP=y and CONFIG_E1000=y to enable IP address assignment on boot.
~/linux/> make -j6
~/linux/> make modules
~/linux/> INSTALL_MOD_PATH=./mods make modules_install
~/linux/> sudo ln -s $PWD/mods/lib/modules/4.1.0-rc7+ /lib/modules/4.1.0-rc7+
This leaves us with a compressed kernel image file at arch/x86/boot/bzimage, and corresponding modules installed under mods/lib/module/<kernel-version>, where <kernel-version> is 4.1.0-rc7+ in this example. The /lib/modules/4.1.0-rc7+ symlink allows Dracut to locate the modules.

The next step is to generate an initial RAM filesystem, or initramfs, which includes a minimal set of user-space utilities, and kernel modules needed for testing:

~/linux/> dracut --kver "4.1.0-rc7+" \
--add-drivers "iscsi_target_mod target_core_mod" \
--add-drivers "target_core_file target_core_iblock" \
--add-drivers "configfs" \
--install "ps grep netstat" \
--no-hostonly --no-hostonly-cmdline \
--modules "bash base shutdown network ifcfg" initramfs
...
*** Creating image file done ***

We now have an initramfs file in the current directory, with the following contents:
  • LIO kernel modules obtained from /lib/module/4.1.0-rc7, as directed via the --kver and --add-drivers parameters.
  • User-space shell, boot and network helpers, as directed via the --modules parameter.

We're now ready to use QEMU/KVM to boot our test kernel and initramfs:

~/linux/> qemu-kvm -kernel arch/x86/boot/bzImage \
-initrd initramfs \
-device e1000,netdev=network0 \
-netdev user,id=network0 \
-redir tcp:51550::3260 \
-append "ip=dhcp rd.shell=1 console=ttyS0" \
-nographic

This boots the test environment, with the kernel and initramfs previously generated:

[    3.216596] dracut Warning: dracut: FATAL: No or empty root= argument
[ 3.217998] dracut Warning: dracut: Refusing to continue
...
Dropping to debug shell.

dracut:/#

From the dracut shell, confirm that the QEMU DHCP server assigned the VM an IP address:

dracut:/# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
...
inet 10.0.2.15/24 brd 10.0.2.255 scope global eth0

Port 3260 (iSCSI) on this interface is forwarded to/from port 51550 on the hypervisor, as configured via the qemu-kvm -redir parameter.

Now onto LIO iSCSI target setup. First off load the appropriate kernel modules:

dracut:/# modprobe iscsi_target_mod
dracut:/# cat /proc/modules
iscsi_target_mod 246669 0 - Live 0xffffffffa006a000
target_core_mod 289004 1 iscsi_target_mod, Live 0xffffffffa000b000
configfs 22407 3 iscsi_target_mod,target_core_mod, Live 0xffffffffa0000000

LIO configuration requires a mounted configfs filesystem:

dracut:/# mount -t configfs configfs /sys/kernel/config/
dracut:/# cat /sys/kernel/config/target/version
Target Engine Core ConfigFS Infrastructure v4.1.0 on Linux/x86_64 on 4.1.0-rc1+

An iSCSI target can be provisioned by manipulating corresponding configfs entries. I used the lio_dump output on an existing setup as reference:

dracut:/# mkdir /sys/kernel/config/target/iscsi
dracut:/# echo -n 0 > /sys/kernel/config/target/iscsi/discovery_auth/enforce_discovery_auth
dracut:/# mkdir -p /sys/kernel/config/target/iscsi/<iscsi_iqn>/tpgt_1/np/10.0.2.15:3260
...

Finally, we're ready to connect to the LIO target using the local hypervisor port that forwards to the VM's virtual network adapter:


~/linux/> iscsiadm --mode discovery \
--type sendtargets \
--portal 127.0.0.1:51550
10.0.2.15:3260,1 iqn.2015-04.suse.arch:5eca2313-028d-435c-9131-53a5ab256a83

It works!

There are a few things that can be adjusted:
  • Port forwarding to the VM network is a bit fiddly - I'm now using a bridge/TAP configuration instead.
  • When dropping into the emergency boot shell, Dracut executes scripts carried under /lib/dracut/hooks/emergency/. This means that a custom script can be triggered on boot via:
    ~/linux/> dracut -i runme.sh /lib/dracut/hooks/emergency/02-runme.sh ...
  • It should be possible to have Dracut pull the kernel modules in from the temporary directory, but I wasn't able to get this working:
    ~/linux/> INSTALL_MOD_PATH=./mods make modules_install
    ~/linux/> dracut --kver "4.1.0-rc7+" --kmoddir ./mods/lib/...

Update 20150722:
  • Don't install kernel modules as root, set up a /lib/modules symlink for Dracut instead.
  • Link to bridge/TAP networking post.
  • Describe boot script usage. 

    July 24, 2015 11:23 AM

    July 12, 2015

    David

    QEMU/KVM Bridged Network with TAP interfaces

    In my previous post, Rapid Linux Kernel Dev/Test with QEMU, KVM and Dracut, I described how build and boot a Linux kernel quickly, making use of port forwarding between hypervisor and guest VM for virtual network traffic.

    This post describes how to plumb the Linux VM directly into a hypervisor network, through the use of a bridge.

    Start by creating a bridge on the hypervisor system:

    > sudo /sbin/brctl addbr br0

    Clear the IP address on the network interface that you'll be bridging (e.g. eth0).
    Note: This will disable network traffic on eth0!
    > sudo ip addr flush dev eth0
    Add the interface to the bridge:
    > sudo /sbin/brctl addif br0 eth0

    Next up, create a TAP interface:
    > sudo /sbin/tunctl -u $(whoami)
    Set 'tap0' persistent and owned by uid 1001
    The -u parameter ensures that the current user will be able to connect to the TAP interface.

    Add the TAP interface to the bridge:
    > sudo /sbin/brctl addif br0 tap0

    Make sure everything is up:
    > sudo ip link set dev br0 up
    > sudo ip link set dev tap0 up

    The TAP interface is now ready for use. Assuming that a DHCP server is available on the bridged network, the VM can now obtain an IP address during boot via:
    > qemu-kvm -kernel arch/x86/boot/bzImage \
    -initrd initramfs \
    -device e1000,netdev=network0,mac=52:55:00:d1:55:01 \
    -netdev tap,id=network0,ifname=tap0,script=no,downscript=no \
    -append "ip=dhcp rd.shell=1 console=ttyS0" -nographic

    The MAC address is explicitly specified, so care should be taken to ensure its uniqueness.

    The DHCP server response details are printed alongside network interface configuration. E.g.
    [    3.792570] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
    [ 3.796085] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
    [ 3.812083] Sending DHCP requests ., OK
    [ 4.824174] IP-Config: Got DHCP answer from 10.155.0.42, my address is 10.155.0.1
    [ 4.825119] IP-Config: Complete:
    [ 4.825476] device=eth0, hwaddr=52:55:00:d1:55:01, ipaddr=10.155.0.1, mask=255.255.0.0, gw=10.155.0.254
    [ 4.826546] host=rocksolid-sles, domain=suse.de, nis-domain=suse.de
    ...

    Didn't get an IP address? There are a few things to check:
    • Confirm that the kernel is built with boot-time DHCP client (CONFIG_IP_PNP_DHCP=y) and E1000 network driver (CONFIG_E1000=y) support.
    • Check the -device and -netdev arguments specify a valid e1000 TAP interface.
    • Ensure that ip=dhcp is provided as a kernel boot parameter, and that the DHCP server is up and running.
    Happy hacking!

    July 12, 2015 06:52 PM

    July 08, 2015

    Rusty

    The Megatransaction: Why Does It Take 25 Seconds?

    Last night f2pool mined a 1MB block containing a single 1MB transaction.  This scooped up some of the spam which has been going to various weakly-passworded “brainwallets”, gaining them 0.5569 bitcoins (on top of the normal 25 BTC subsidy).  You can see the megatransaction on blockchain.info.

    It was widely reported to take about 25 seconds for bitcoin core to process this block: this is far worse than my “2 seconds per MB” result in my last post, which was considered a pretty bad case.  Let’s look at why.

    How Signatures Are Verified

    The algorithm to check a transaction input (of this form) looks like this:

    1. Strip the other inputs from the transaction.
    2. Replace the input script we’re checking with the script of the output it’s trying to spend.
    3. Hash the resulting transaction with SHA256, then hash the result with SHA256 again.
    4. Check the signature correctly signed that hash result.

    Now, for a transaction with 5570 inputs, we have to do this 5570 times.  And the bitcoin core code does this by making a copy of the transaction each time, and using the marshalling code to hash that; it’s not a huge surprise that we end up spending 20 seconds on it.

    How Fast Could Bitcoin Core Be If Optimized?

    Once we strip the inputs, the result is only about 6k long; hashing 6k 5570 times takes about 265 milliseconds (on my modern i3 laptop).  We have to do some work to change the transaction each time, but we should end up under half a second without any major backflips.

    Problem solved?  Not quite….

    This Block Isn’t The Worst Case (For An Optimized Implementation)

    As I said above, the amount we have to hash is about 6k; if a transaction has larger outputs, that number changes.  We can fit in fewer inputs though.  A simple simulation shows the worst case for 1MB transaction has 3300 inputs, and 406000 byte output(s): simply doing the hashing for input signatures takes about 10.9 seconds.  That’s only about two or three times faster than the bitcoind naive implementation.

    This problem is far worse if blocks were 8MB: an 8MB transaction with 22,500 inputs and 3.95MB of outputs takes over 11 minutes to hash.  If you can mine one of those, you can keep competitors off your heels forever, and own the bitcoin network… Well, probably not.  But there’d be a lot of emergency patching, forking and screaming…

    Short Term Steps

    An optimized implementation in bitcoind is a good idea anyway, and there are three obvious paths:

    1. Optimize the signature hash path to avoid the copy, and hash in place as much as possible.
    2. Use the Intel and ARM optimized SHA256 routines, which increase SHA256 speed by about 80%.
    3. Parallelize the input checking for large numbers of inputs.

    Longer Term Steps

    A soft fork could introduce an OP_CHECKSIG2, which hashes the transaction in a different order.  In particular, it should hash the input script replacement at the end, so the “midstate” of the hash can be trivially reused.  This doesn’t entirely eliminate the problem, since the sighash flags can require other permutations of the transaction; these would have to be carefully explored (or only allowed with OP_CHECKSIG).

    This soft fork could also place limits on how big an OP_CHECKSIG-using transaction could be.

    Such a change will take a while: there are other things which would be nice to change for OP_CHECKSIG2, such as new sighash flags for the Lightning Network, and removing the silly DER encoding of signatures.

    July 08, 2015 03:09 AM

    July 06, 2015

    Rusty

    Bitcoin Core CPU Usage With Larger Blocks

    Since I was creating large blocks (41662 transactions), I added a little code to time how long they take once received (on my laptop, which is only an i3).

    The obvious place to look is CheckBlock: a simple 1MB block takes a consistent 10 milliseconds to validate, and an 8MB block took 79 to 80 milliseconds, which is nice and linear.  (A 17MB block took 171 milliseconds).

    Weirdly, that’s not the slow part: promoting the block to the best block (ActivateBestChain) takes 1.9-2.0 seconds for a 1MB block, and 15.3-15.7 seconds for an 8MB block.  At least it’s scaling linearly, but it’s just slow.

    So, 16 Seconds Per 8MB Block?

    I did some digging.  Just invalidating and revalidating the 8MB block only took 1 second, so something about receiving a fresh block makes it worse. I spent a day or so wrestling with benchmarking[1]…

    Indeed, ConnectTip does the actual script evaluation: CheckBlock() only does a cursory examination of each transaction.  I’m guessing bitcoin core is not smart enough to parallelize a chain of transactions like mine, hence the 2 seconds per MB.  On normal transaction patterns even my laptop should be about 4 times faster than that (but I haven’t actually tested it yet!).

    So, 4 Seconds Per 8MB Block?

    But things are going to get better: I hacked in the currently-disabled libsecp256k1, and the time for the 8MB ConnectTip dropped from 18.6 seconds to 6.5 seconds.

    So, 1.6 Seconds Per 8MB Block?

    I re-enabled optimization after my benchmarking, and the result was 4.4 seconds; that’s libsecp256k1, and an 8MB block.

    Let’s Say 1.1 Seconds for an 8MB Block

    This is with some assumptions about parallelism; and remember this is on my laptop which has a fairly low-end CPU.  While you may not be able to run a competitive mining operation on a Raspberry Pi, you can pretty much ignore normal verification times in the blocksize debate.


     

    [1] I turned on -debug=bench, which produced impenetrable and seemingly useless results in the log.

    So I added a print with a sleep, so I could run perf.  Then I disabled optimization, so I’d get understandable backtraces with perf.  Then I rebuilt perf because Ubuntu’s perf doesn’t demangle C++ symbols, which is part of the kernel source package. (Are we having fun yet?).  I even hacked up a small program to help run perf on just that part of bitcoind.   Finally, after perf failed me (it doesn’t show 100% CPU, no idea why; I’d expect to see main in there somewhere…) I added stderr prints and ran strace on the thing to get timings.

    July 06, 2015 09:58 PM

    July 03, 2015

    Rusty

    Wrapper for running perf on part of a program.

    Linux’s perf competes with early git for title of least-friendly Linux tool.  Because it’s tied to kernel versions, and the interfaces changes fairly randomly, you can never figure out how to use the version you need to use (hint: always use -g).

    But when it works, it’s very useful.  Recently I wanted to figure out where bitcoind was spending its time processing a block; because I’m a cool kid, I didn’t use gprof, I used perf.  The problem is that I only want information on that part of bitcoind.  To start with, I put a sleep(30) and a big printf in the source, but that got old fast.

    Thus, I wrote “perfme.c“.  Compile it (requires some trivial CCAN headers) and link perfme-start and perfme-stop to the binary.  By default it runs/stops perf record on its parent, but an optional pid arg can be used for other things (eg. if your program is calling it via system(), the shell will be the parent).

    July 03, 2015 03:19 AM

    June 25, 2015

    Rusty

    Hashing Speed: SHA256 vs Murmur3

    So I did some IBLT research (as posted to bitcoin-dev ) and I lazily used SHA256 to create both the temporary 48-bit txids, and from them to create a 16-bit index offset.  Each node has to produce these for every bitcoin transaction ID it knows about (ie. its entire mempool), which is normally less than 10,000 transactions, but we’d better plan for 1M given the coming blopockalypse.

    For txid48, we hash an 8 byte seed with the 32-byte txid; I ignored the 8 byte seed for the moment, and measured various implementations of SHA256 hashing 32 bytes on on my Intel Core i3-5010U CPU @ 2.10GHz laptop (though note we’d be hashing 8 extra bytes for IBLT): (implementation in CCAN)

    1. Bitcoin’s SHA256: 527.7+/-0.9 nsec
    2. Optimizing the block ending on bitcoin’s SHA256: 500.4+/-0.66 nsec
    3. Intel’s asm rorx: 314.1+/-0.3 nsec
    4. Intel’s asm SSE4 337.5+/-0.5 nsec
    5. Intel’s asm RORx-x8ms 458.6+/-2.2 nsec
    6. Intel’s asm AVX 336.1+/-0.3 nsec

    So, if you have 1M transactions in your mempool, expect it to take about 0.62 seconds of hashing to calculate the IBLT.  This is too slow (though it’s fairly trivially parallelizable).  However, we just need a universal hash, not a cryptographic one, so I benchmarked murmur3_x64_128:

    1. Murmur3-128: 23 nsec

    That’s more like 0.046 seconds of hashing, which seems like enough of a win to add a new hash to the mix.

    June 25, 2015 07:51 AM

    June 19, 2015

    Rusty

    Mining on a Home DSL connection: latency for 1MB and 8MB blocks

    I like data.  So when Patrick Strateman handed me a hacky patch for a new testnet with a 100MB block limit, I went to get some.  I added 7 digital ocean nodes, another hacky patch to prevent sendrawtransaction from broadcasting, and a quick utility to create massive chains of transactions/

    My home DSL connection is 11Mbit down, and 1Mbit up; that’s the fastest I can get here.  I was CPU mining on my laptop for this test, while running tcpdump to capture network traffic for analysis.  I didn’t measure the time taken to process the blocks on the receiving nodes, just the first propagation step.

    1 Megabyte Block

    Naively, it should take about 10 seconds to send a 1MB block up my DSL line from first packet to last.  Here’s what actually happens, in seconds for each node:

    1. 66.8
    2. 70.4
    3. 71.8
    4. 71.9
    5. 73.8
    6. 75.1
    7. 75.9
    8. 76.4

    The packet dump shows they’re all pretty much sprayed out simultaneously (bitcoind may do the writes in order, but the network stack interleaves them pretty well).  That’s why it’s 67 seconds at best before the first node receives my block (a bit longer, since that’s when the packet left my laptop).

    8 Megabyte Block

    I increased my block size, and one node dropped out, so this isn’t quite the same, but the times to send to each node are about 8 times worse, as expected:

    1. 501.7
    2. 524.1
    3. 536.9
    4. 537.6
    5. 538.6
    6. 544.4
    7. 546.7

    Conclusion

    Using the rough formula of 1-exp(-t/600), I would expect orphan rates of 10.5% generating 1MB blocks, and 56.6% with 8MB blocks; that’s a huge cut in expected profits.

    Workarounds

    • Get a faster DSL connection.  Though even an uplink 10 times faster would mean 1.1% orphan rate with 1MB blocks, or 8% with 8MB blocks.
    • Only connect to a single well-connected peer (-maxconnections=1), and hope they propagate your block.
    • Refuse to mine any transactions, and just collect the block reward.  Doesn’t help the bitcoin network at all though.
    • Join a large pool.  This is what happens in practice, but raises a significant centralization problem.

    Fixes

    • We need bitcoind to be smarter about ratelimiting in these situations, and stream serially.  Done correctly (which is hard), it could also help bufferbloat which makes running a full node at home so painful when it propagates blocks.
    • Some kind of block compression, along the lines of Gavin’s IBLT idea. I’ve done some preliminary work on this, and it’s promising, but far from trivial.

     

    June 19, 2015 02:37 AM

    June 03, 2015

    Rusty

    What Transactions Get Crowded Out If Blocks Fill?

    What happens if bitcoin blocks fill?  Miners choose transactions with the highest fees, so low fee transactions get left behind.  Let’s look at what makes up blocks today, to try to figure out which transactions will get “crowded out” at various thresholds.

    Some assumptions need to be made here: we can’t automatically tell the difference between me taking a $1000 output and paying you 1c, and me paying you $999.99 and sending myself the 1c change.  So my first attempt was very conservative: only look at transactions with two or more outputs which were under the given thresholds (I used a nice round $200 / BTC price throughout, for simplicity).

    (Note: I used bitcoin-iterate to pull out transaction data, and rebuild blocks without certain transactions; you can reproduce the csv files in the blocksize-stats directory if you want).

    Paying More Than 1 Person Under $1 (< 500000 Satoshi)

    Here’s the result (against the current blocksize):

    Sending 2 Or More Sub-$1 Outputs

    Let’s zoom in to the interesting part, first, since there’s very little difference before 220,000 (February 2013).  You can see that only about 18% of transactions are sending less than $1 and getting less than $1 in change:

    Since March 2013…

    Paying Anyone Under 1c, 10c, $1

    The above graph doesn’t capture the case where I have $100 and send you 1c.   If we eliminate any transaction which has any output less than various thresholds, we’ll catch that. The downside is that we capture the “sending myself tiny change” case, but I’d expect that to be rarer:

    Blocksizes Without Small Output Transactions

    This eliminates far more transactions.  We can see only 2.5% of the block size is taken by transactions with 1c outputs (the dark red line following the block “current blocks” line), but the green line shows about 20% of the block used for 10c transactions.  And about 45% of the block is transactions moving $1 or less.

    Interpretation: Hard Landing Unlikely, But Microtransactions Lose

    If the block size doesn’t increase (or doesn’t increase in time): we’ll see transactions get slower, and fees become the significant factor in whether your transaction gets processed quickly.  People will change behaviour: I’m not going to spend 20c to send you 50c!

    Because block finding is highly variable and many miners are capping blocks at 750k, we see backlogs at times already; these bursts will happen with increasing frequency from now on.  This will put pressure on Satoshdice and similar services, who will be highly incentivized to use StrawPay or roll their own channel mechanism for off-blockchain microtransactions.

    I’d like to know what timescale this happens on, but the graph shows that we grow (and occasionally shrink) in bursts.  A logarithmic graph prepared by Peter R of bitcointalk.org suggests that we hit 1M mid-2016 or so; expect fee pressure to bend that graph downwards soon.

    The bad news is that even if fees hit (say) 25c and that prevents all the sub-$1 transactions, we only double our capacity, giving us perhaps another 18 months. (At that point miners are earning $1000 from transaction fees as well as $5000 (@ $200/BTC) from block reward, which is nice for them I guess.)

    My Best Guess: Larger Blocks Desirable Within 2 Years, Needed by 3

    Personally I think 5c is a reasonable transaction fee, but I’d prefer not to see it until we have decentralized off-chain alternatives.  I’d be pretty uncomfortable with a 25c fee unless the Lightning Network was so ubiquitous that I only needed to pay it twice a year.  Higher than that would have me reaching for my credit card to charge my Lightning Network account :)

    Disclaimer: I Work For BlockStream, on Lightning Networks

    Lightning Networks are a marathon, not a sprint.  The development timeframes in my head are even vaguer than the guesses above.  I hope it’s part of the eventual answer, but it’s not the bandaid we’re looking for.  I wish it were different, but we’re going to need other things in the mean time.

    I hope this provided useful facts, whatever your opinions.

    June 03, 2015 03:57 AM

    Current Blocksize, by graphs.

    I used bitcoin-iterate and gnumeric to render the current bitcoin blocksizes, and here are the results.

    My First Graph: A Moment of Panic

    This is block sizes up to yesterday; I’ve asked gnumeric to derive an exponential trend line from the data (in black; the red one is linear)

    Woah! We hit 1M blocks in a month! PAAAANIC!

    That trend line hits 1000000 at block 363845.5, which we’d expect in about 32 days time!  This is what is freaking out so many denizens of the Bitcoin Subreddit. I also just saw a similar inaccurate [correction: misleading] graph reshared by Mike Hearn on G+ :(

    But Wait A Minute

    That trend line says we’re on 800k blocks today, and we’re clearly not.  Let’s add a 6 hour moving average:

    Oh, we’re only halfway there….

    In fact, if we cluster into 36 blocks (ie. 6 hours worth), we can see how misleading the terrible exponential fit is:

    What! We’re already over 1M blocks?? Maths, you lied to me!

    Clearer Graphs: 1 week Moving Average

    Actual Weekly Running Average Blocksize

    So, not time to panic just yet, though we’re clearly growing, and in unpredictable bursts.

    June 03, 2015 02:34 AM

    June 01, 2015

    Rusty

    Block size: rate of internet speed growth since 2008?

    I’ve been trying not to follow the Great Blocksize Debate raging on reddit.  However, the lack of any concrete numbers has kind of irked me, so let me add one for now.

    If we assume bandwidth is the main problem with running nodes, let’s look at average connection growth rates since 2008.  Google lead me to NetMetrics (who seem to charge), and Akamai’s State Of The Internet (who don’t).  So I used the latter, of course:

    Akamai’s Average Connection Speed Chart Q4/07 to Q4/14

    I tried to pick a range of countries, and here are the results:

    Country % Growth Over 7 years Per Annum
    Australia 348 19.5%
    Brazil 349 19.5%
    China 481 25.2%
    Philippines 258 14.5%
    UK 333 18.8%
    US 304 17.2%

     

    Countries which had best bandwidth grew about 17% a year, so I think that’s the best model for future growth patterns (China is now where the US was 7 years ago, for example).

    If bandwidth is the main centralization concern, you’ll want block growth below 15%. That implies we could jump the cap to 3MB next year, and 15% thereafter. Or if you’re less conservative, 3.5MB next year, and 17% there after.

    June 01, 2015 01:20 AM

    May 25, 2015

    David

    Using the Azure File Service on Linux

    The Microsoft Azure File Service is a new SMB shared-storage service offered on the Microsoft Azure public cloud.

    The service allows for the instant provisioning of file shares for private access by cloud provisioned VMs using the SMB 2.1 protocol, and additionally supports public access via a new REST interface.

    Update 2015-05-25: File shares can now also be provisioned from Linux using Elasto.



    Linux VMs deployed on Azure can make use of this service using the Linux Kernel CIFS client. The kernel client must be configured to support and use the SMB 2.1 protocol dialect:
    • CONFIG_CIFS_SMB2 must be enabled in the kernel configuration at build time
      • Use
        # zcat /proc/config.gz | grep CONFIG_CIFS_SMB2
        to check this on a running system.
    • The vers=2.1 mount.cifs parameter must be provided at mount time.
    • Furthermore, the Azure storage account and access key must be provided as username and password.

    # mount.cifs -o vers=2.1,user=smb //smb.file.core.windows.net/share /share/
    Password for smb@//smb.file.core.windows.net/share: ******...
    # df -h /share/
    Filesystem Size Used Avail Use% Mounted on
    //smb.file.core.windows.net/share 5.0T 0 5.0T 0% /share

    This feature will be supported with the upcoming release of SUSE Linux Enterprise Server 12, and future openSUSE releases.

    Disclaimer: I work in the Labs department at SUSE.

    May 25, 2015 01:14 PM

    May 23, 2015

    David

    Azure File Service IO with Elasto on Linux

    In an earlier post I described the basics of the Microsoft Azure File Service, and how it can be used on Linux with the cifs.ko kernel client.

    Since that time I've been hacking away on the Elasto cloud storage client, to the point that it now (with version 0.6.0) supports Azure File Service share provisioning as well as file and directory IO.


    To play with Elasto yourself:
    • Install the packages
    • Download your Azure PublishSettings credentials
    • Run
      elasto_cli -s Azure_PublishSettings_File -u afs://
    Keep in mind that Elasto is still far from mature, so don't be surprised if it corrupts your data or causes a fire.
    With the warning out of the way, I'd like to thank:
    • My employer SUSE Linux, for supporting my Elasto development efforts during Hack Week.
    • Samba Experience conference organisers, for giving me the chance to talk about the project.
    • Kdenlive developers, for writing great video editing software.

    May 23, 2015 10:12 PM

    April 30, 2015

    Rusty

    Some bitcoin mempool data: first look

    Previously I discussed the use of IBLTs (on the pettycoin blog).  Kalle and I got some interesting, but slightly different results; before I revisited them I wanted some real data to play with.

    Finally, a few weeks ago I ran 4 nodes for a week, logging incoming transactions and the contents of the mempools when we saw a block.  This gives us some data to chew on when tuning any fast block sync mechanism; here’s my first impressions looking a the data (which is available on github).

    These graphs are my first look; in blue is the number of txs in the block, and in purple stacked on top is the number of txs which were left in the mempool after we took those away.

    The good news is that all four sites are very similar; there’s small variance across these nodes (three are in Digital Ocean data centres and one is behind two NATs and a wireless network at my local coworking space).

    The bad news is that there are spikes of very large mempools around block 352,800; a series of 731kb blocks which I’m guessing is some kind of soft limit for some mining software [EDIT: 750k is the default soft block limit; reported in 1024-byte quantities as blockchain.info does, this is 732k.  Thanks sipa!].  Our ability to handle this case will depend very much on heuristics for guessing which transactions are likely candidates to be in the block at all (I’m hoping it’s as simple as first-seen transactions are most likely, but I haven’t tested yet).

    Transactions in Mempool and in Blocks: Australia (poor connection)

    Transactions in Mempool and in Blocks: Singapore

    Transactions in Mempool and in Blocks: San Francisco

    Transactions in Mempool and in Blocks: San Francisco (using Relay Network)

    April 30, 2015 12:26 PM

    April 14, 2015

    Chris

    Samba in the Wild

    Every now and again, Samba shows up somewhere unexpected.
    Here’s it is on a sidewalk:

    Samba on the SidewalkHere it is again at a restaurant:

    April 14, 2015 07:01 PM

    April 08, 2015

    Rusty

    Lightning Networks Part IV: Summary

    This is the fourth part of my series of posts explaining the bitcoin Lightning Networks 0.5 draft paper.  See Part I, Part II and Part III.

    The key revelation of the paper is that we can have a network of arbitrarily complicated transactions, such that they aren’t on the blockchain (and thus are fast, cheap and extremely scalable), but at every point are ready to be dropped onto the blockchain for resolution if there’s a problem.  This is genuinely revolutionary.

    It also vindicates Satoshi’s insistence on the generality of the Bitcoin scripting system.  And though it’s long been suggested that bitcoin would become a clearing system on which genuine microtransactions would be layered, it was unclear that we were so close to having such a system in bitcoin already.

    Note that the scheme requires some solution to malleability to allow chains of transactions to be built (this is a common theme, so likely to be mitigated in a future soft fork), but Gregory Maxwell points out that it also wants selective malleability, so transactions can be replaced without invalidating the HTLCs which are spending their outputs.  Thus it proposes new signature flags, which will require active debate, analysis and another soft fork.

    There is much more to discover in the paper itself: recommendations for lightning network routing, the node charging model, a risk summary, the specifics of the softfork changes, and more.

    I’ll leave you with a brief list of requirements to make Lightning Networks a reality:

    1. A soft-fork is required, to protect against malleability and to allow new signature modes.
    2. A new peer-to-peer protocol needs to be designed for the lightning network, including routing.
    3. Blame and rating systems are needed for lightning network nodes.  You don’t have to trust them, but it sucks if they go down as your money is probably stuck until the timeout.
    4. More refinements (eg. relative OP_CHECKLOCKTIMEVERIFY) to simplify and tighten timeout times.
    5. Wallets need to learn to use this, with UI handling of things like timeouts and fallbacks to the bitcoin network (sorry, your transaction failed, you’ll get your money back in N days).
    6. You need to be online every 40 days to check that an old HTLC hasn’t leaked, which will require some alternate solution for occasional users (shut down channel, have some third party, etc).
    7. A server implementation needs to be written.

    That’s a lot of work!  But it’s all simply engineering from here, just as bitcoin was once the paper was released.  I look forward to seeing it happen (and I’m confident it will).

    April 08, 2015 03:59 AM

    April 06, 2015

    Rusty

    Lightning Networks Part III: Channeling Contracts

    This is the third part of my series of posts explaining the bitcoin Lightning Networks 0.5 draft paper.

    In Part I I described how a Poon-Dryja channel uses a single in-blockchain transaction to create off-blockchain transactions which can be safely updated by either party (as long as both agree), with fallback to publishing the latest versions to the blockchain if something goes wrong.

    In Part II I described how Hashed Timelocked Contracts allow you to safely make one payment conditional upon another, so payments can be routed across untrusted parties using a series of transactions with decrementing timeout values.

    Now we’ll join the two together: encapsulate Hashed Timelocked Contracts inside a channel, so they don’t have to be placed in the blockchain (unless something goes wrong).

    Revision: Why Poon-Dryja Channels Work

    Here’s half of a channel setup between me and you where I’m paying you 1c: (there’s always a mirror setup between you and me, so it’s symmetrical)

    Half a channel: we will invalidate transaction 1 (in favour of a new transaction 2) to send funds.

    The system works because after we agree on a new transaction (eg. to pay you another 1c), you revoke this by handing me your private keys to unlock that 1c output.  Now if you ever released Transaction 1, I can spend both the outputs.  If we want to add a new output to Transaction 1, we need to be able to make it similarly stealable.

    Adding a 1c HTLC Output To Transaction 1 In The Channel

    I’m going to send you 1c now via a HTLC (which means you’ll only get it if the riddle is answered; if it times out, I get the 1c back).  So we replace transaction 1 with transaction 2, which has three outputs: $9.98 to me, 1c to you, and 1c to the HTLC: (once we agree on the new transactions, we invalidate transaction 1 as detailed in Part I)

    Our Channel With an Output for an HTLC

    Note that you supply another separate signature (sig3) for this output, so you can reveal that private key later without giving away any other output.

    We modify our previous HTLC design so you revealing the sig3 would allow me to steal this output. We do this the same way we did for that 1c going to you: send the output via a timelocked mutually signed transaction.  But there are two transaction paths in an HTLC: the got-the-riddle path and the timeout path, so we need to insert those timelocked mutually signed transactions in both of them.  First let’s append a 1 day delay to the timeout path:

    Timeout path of HTLC, with locktime so it can be stolen once you give me your sig3.

    Similarly, we need to append a timelocked transaction on the “got the riddle solution” path, which now needs my signature as well (otherwise you could create a replacement transaction and bypass the timelocked transaction):

    Full HTLC: If you reveal Transaction 2 after we agree it’s been revoked, and I have your sig3 private key, I can spend that output before you can, down either the settlement or timeout paths.

    Remember The Other Side?

    Poon-Dryja channels are symmetrical, so the full version has a matching HTLC on the other side (except with my temporary keys, so you can catch me out if I use a revoked transaction).  Here’s the full diagram, just to be complete:

    A complete lightning network channel with an HTLC, containing a glorious 13 transactions.

    Closing The HTLC

    When an HTLC is completed, we just update transaction 2, and don’t include the HTLC output.  The funds either get added to your output (R value revealed before timeout) or my output (timeout).

    Note that we can have an arbitrary number of independent HTLCs in progress at once, and open and/or close as many in each transaction update as both parties agree to.

    Keys, Keys Everywhere!

    Each output for a revocable transaction needs to use a separate address, so we can hand the private key to the other party.  We use two disposable keys for each HTLC[1], and every new HTLC will change one of the other outputs (either mine, if I’m paying you, or yours if you’re paying me), so that needs a new key too.  That’s 3 keys, doubled for the symmetry, to give 6 keys per HTLC.

    Adam Back pointed out that we can actually implement this scheme without the private key handover, and instead sign a transaction for the other side which gives them the money immediately.  This would permit more key reuse, but means we’d have to store these transactions somewhere on the off chance we needed them.

    Storing just the keys is smaller, but more importantly, Section 6.2 of the paper describes using BIP 32 key hierarchies so the disposable keys are derived: after a while, you only need to store one key for all the keys the other side has given you.  This is vastly more efficient than storing a transaction for every HTLC, and indicates the scale (thousands of HTLCs per second) that the authors are thinking.

    Next: Conclusion

    My next post will be a TL;DR summary, and some more references to the implementation details and possibilities provided by the paper.

     


    [1] The new sighash types are fairly loose, and thus allow you to attach a transaction to a different parent if it uses the same output addresses.  I think we could re-use the same keys in both paths if we ensure that the order of keys required is reversed for one, but we’d still need 4 keys, so it seems a bit too tricky.

    April 06, 2015 11:21 AM

    April 01, 2015

    Rusty

    Lightning Networks Part II: Hashed Timelock Contracts (HTLCs)

    In Part I, we demonstrated Poon-Dryja channels; a generalized channel structure which used revocable transactions to ensure that old transactions wouldn’t be reused.

    A channel from me<->you would allow me to efficiently send you 1c, but that doesn’t scale since it takes at least one on-blockchain transaction to set up each channel. The solution to this is to route funds via intermediaries;  in this example we’ll use the fictitious “MtBox”.

    If I already have a channel with MtBox’s Payment Node, and so do you, that lets me reliably send 1c to MtBox without (usually) needing the blockchain, and it lets MtBox send you 1c with similar efficiency.

    But it doesn’t give me a way to force them to send it to you; I have to trust them.  We can do better.

    Bonding Unrelated Transactions using Riddles

    For simplicity, let’s ignore channels for the moment.  Here’s the “trust MtBox” solution:

    I send you 1c via MtBox; simplest possible version, using two independent transactions. I trust MtBox to generate its transaction after I send it mine.

    What if we could bond these transactions together somehow, so that when you spend the output from the MtBox transaction, that automatically allows MtBox to spend the output from my transaction?

    Here’s one way. You send me a riddle question to which nobody else knows the answer: eg. “What’s brown and sticky?”.  I then promise MtBox the 1c if they answer that riddle correctly, and tell MtBox that you know.

    MtBox doesn’t know the answer, so it turns around and promises to pay you 1c if you answer “What’s brown and sticky?”. When you answer “A stick”, MtBox can pay you 1c knowing that it can collect the 1c off me.

    The bitcoin blockchain is really good at riddles; in particular “what value hashes to this one?” is easy to express in the scripting language. So you pick a random secret value R, then hash it to get H, then send me H.  My transaction’s 1c output requires MtBox’s signature, and a value which hashes to H (ie. R).  MtBox adds the same requirement to its transaction output, so if you spend it, it can get its money back from me:

    Two Independent Transactions, Connected by A Hash Riddle.

    Handling Failure Using Timeouts

    This example is too simplistic; when MtBox’s PHP script stops processing transactions, I won’t be able to get my 1c back if I’ve already published my transaction.  So we use a familiar trick from Part I, a timeout transaction which after (say) 2 days, returns the funds to me.  This output needs both my and MtBox’s signatures, and MtBox supplies me with the refund transaction containing the timeout:

    Hash Riddle Transaction, With Timeout

    MtBox similarly needs a timeout in case you disappear.  And it needs to make sure it gets the answer to the riddle from you within that 2 days, otherwise I might use my timeout transaction and it can’t get its money back.  To give plenty of margin, it uses a 1 day timeout:

    MtBox Needs Your Riddle Answer Before It Can Answer Mine

    Chaining Together

    It’s fairly clear to see that longer paths are possible, using the same “timelocked” transactions.  The paper uses 1 day per hop, so if you were 5 hops away (say, me <-> MtBox <-> Carol <-> David <-> Evie <-> you) I would use a 5 day timeout to MtBox, MtBox a 4 day to Carol, etc.  A routing protocol is required, but if some routing doesn’t work two nodes can always cancel by mutual agreement (by creating timeout transaction with no locktime).

    The paper refers to each set of transactions as contracts, with the following terms:

    • If you can produce to MtBox an unknown 20-byte random input data R from a known H, within two days, then MtBox will settle the contract by paying you 1c.
    • If two days have elapsed, then the above clause is null and void and the clearing process is invalidated.
    • Either party may (and should) pay out according to the terms of this contract in any method of the participants choosing and close out this contract early so long as both participants in this contract agree.

    The hashing and timelock properties of the transactions are what allow them to be chained across a network, hence the term Hashed Timelock Contracts.

    Next: Using Channels With Hashed Timelock Contracts.

    The hashed riddle construct is cute, but as detailed above every transaction would need to be published on the blockchain, which makes it pretty pointless.  So the next step is to embed them into a Poon-Dryja channel, so that (in the normal, cooperative case) they don’t need to reach the blockchain at all.

    April 01, 2015 11:46 AM

    March 30, 2015

    Rusty

    Lightning Networks Part I: Revocable Transactions

    I finally took a second swing at understanding the Lightning Network paper.  The promise of this work is exceptional: instant reliable transactions across the bitcoin network. But the implementation is complex and the draft paper reads like a grab bag of ideas; but it truly rewards close reading!  It doesn’t involve novel crypto, nor fancy bitcoin scripting tricks.

    There are several techniques which are used in the paper, so I plan to concentrate on one per post and wrap up at the end.

    Revision: Payment Channels

    I open a payment channel to you for up to $10

    A Payment Channel is a method for sending microtransactions to a single recipient, such as me paying you 1c a minute for internet access.  I create an opening transaction which has a $10 output, which can only be redeemed by a transaction input signed by you and me (or me alone, after a timeout, just in case you vanish).  That opening transaction goes into the blockchain, and we’re sure it’s bedded down.

    I pay you 1c in the payment channel. Claim it any time!

    Then I send you a signed transaction which spends that opening transaction output, and has two outputs: one for $9.99 to me, and one for 1c to you.  If you want, you could sign that transaction too, and publish it immediately to get your 1c.

    Update: now I pay you 2c via the payment channel.

    Then a minute later, I send you a signed transaction which spends that same opening transaction output, and has a $9.98 output for me, and a 2c output for you. Each minute, I send you another transaction, increasing the amount you get every time.

    This works because:

    1.  Each transaction I send spends the same output; so only one of them can ever be included in the blockchain.
    2. I can’t publish them, since they need your signature and I don’t have it.
    3. At the end, you will presumably publish the last one, which is best for you.  You could publish an earlier one, and cheat yourself of money, but that’s not my problem.

    Undoing A Promise: Revoking Transactions?

    In the simple channel case above, we don’t have to revoke or cancel old transactions, as the only person who can spend them is the person who would be cheated.  This makes the payment channel one way: if the amount I was paying you ever went down, you could simply broadcast one of the older, more profitable transactions.

    So if we wanted to revoke an old transaction, how would we do it?

    There’s no native way in bitcoin to have a transaction which expires.  You can have a transaction which is valid after 5 days (using locktime), but you can’t have one which is valid until 5 days has passed.

    So the only way to invalidate a transaction is to spend one of its inputs, and get that input-stealing transaction into the blockchain before the transaction you’re trying to invalidate.  That’s no good if we’re trying to update a transaction continuously (a-la payment channels) without most of them reaching the blockchain.

    The Transaction Revocation Trick

    But there’s a trick, as described in the paper.  We build our transaction as before (I sign, and you hold), which spends our opening transaction output, and has two outputs.  The first is a 9.99c output for me.  The second is a bit weird–it’s 1c, but needs two signatures to spend: mine and a temporary one of yours.  Indeed, I create and sign such a transaction which spends this output, and send it to you, but that transaction has a locktime of 1 day:

    The first payment in a lightning-style channel.

    Now, if you sign and publish that transaction, I can spend my $9.99 straight away, and you can publish that timelocked transaction tomorrow and get your 1c.

    But what if we want to update the transaction?  We create a new transaction, with 9.98c output to me and 2c output to a transaction signed by both me and another temporary address of yours.  I create and sign a transaction which spends that 2c output, has a locktime of 1 day and has an output going to you, and send it to you.

    We can revoke the old transaction: you simply give me the temporary private key you used for that transaction.  Weird, I know (and that’s why you had to generate a temporary address for it).  Now, if you were ever to sign and publish that old transaction, I can spend my $9.99 straight away, and create a transaction using your key and my key to spend your 1c.  Your transaction (1a below) which could spend that 1c output is timelocked, so I’ll definitely get my 1c transaction into the blockchain first (and the paper uses a timelock of 40 days, not 1).

    Updating the payment in a lightning-style channel: you sent me your private key for sig2, so I could spend both outputs of Transaction 1 if you were to publish it.

    So the effect is that the old transaction is revoked: if you were to ever sign and release it, I could steal all the money.  Neat trick, right?

    A Minor Variation To Avoid Timeout Fallback

    In the original payment channel, the opening transaction had a fallback clause: after some time, it is all spendable by me.  If you stop responding, I have to wait for this to kick in to get my money back.  Instead, the paper uses a pair of these “revocable” transaction structures.  The second is a mirror image of the first, in effect.

    A full symmetric, bi-directional payment channel.

    So the first output is $9.99 which needs your signature and a temporary signature of mine.  The second is  1c for meyou.  You sign the transaction, and I hold it.  You create and sign a transaction which has that $9.99 as input, a 1 day locktime, and send it to me.

    Since both your and my “revocable” transactions spend the same output, only one can reach the blockchain.  They’re basically equivalent: if you send yours you must wait 1 day for your money.  If I send mine, I have to wait 1 day for my money.  But it means either of us can finalize the payment at any time, so the opening transaction doesn’t need a timeout clause.

    Next…

    Now we have a generalized transaction channel, which can spend the opening transaction in any way we both agree on, without trust or requiring on-blockchain updates (unless things break down).

    The next post will discuss Hashed Timelock Contracts (HTLCs) which can be used to create chains of payments…

    Notes For Pedants:

    In the payment channel open I assume OP_CHECKLOCKTIMEVERIFY, which isn’t yet in bitcoin.  It’s simpler.

    I ignore transaction fees as an unnecessary distraction.

    We need malleability fixes, so you can’t mutate a transaction and break the ones which follow.  But I also need the ability to sign Transaction 1a without a complete Transaction 1 (since you can’t expose the signed version to me).  The paper proposes new SIGHASH types to allow this.

    [EDIT 2015-03-30 22:11:59+10:30: We also need to sign the other symmetric transactions before signing the opening transaction.  If we released a completed opening transaction before having the other transactions, we might be stuck with no way to get our funds back (as we don’t have a “return all to me” timeout on the opening transaction)]

    March 30, 2015 10:47 AM

    March 26, 2015

    Andreas

    Hunting down a fd closing bug in Samba

    In Samba I had a failing test suite. I have nss_wrapper compiled with debug messages turned on, so it showed me the following line:

    NWRAP_ERROR(23052) - nwrap_he_parse_line: 3 Invalid line[TDB]: 'DB'

    The file should parse a hosts file like /etc/hosts, but the debug line showed that it tried to parse a TDB (Trivial Database) file, Samba database backend. I’ve started to investigate it and wondered what was going on. This morning I called Michael Adam and we looked into the issue together. It was obvious that something closed the file descriptor for the hosts file of nss_wrapper and it was by Samba to open other files. The big question was, what the heck closes the fd. As socket_wrapper was loaded and it wraps the open() and close() call we started to add debug to the socket_wrapper code.

    So first we added debug statements to the open() and close() calls to see when the fd was opened and closed. After that we wanted to see a stacktrace at the close() call to see what is the code path were it happens. Here is the code how to do this:

    commit 6c632a4419b6712f975db390145419b008442865
    Author:     Andreas Schneider 
    AuthorDate: Thu Mar 26 11:07:38 2015 +0100
    Commit:     Andreas Schneider 
    CommitDate: Thu Mar 26 11:07:59 2015 +0100
    
        DEBUG stacktrace
    ---
     src/socket_wrapper.c | 37 +++++++++++++++++++++++++++++++++----
     1 file changed, 33 insertions(+), 4 deletions(-)
    
    diff --git a/src/socket_wrapper.c b/src/socket_wrapper.c
    index 1188c4e..cb73cf2 100644
    --- a/src/socket_wrapper.c
    +++ b/src/socket_wrapper.c
    @@ -80,6 +80,8 @@
     #include <rpc/rpc.h>
     #endif
     
    +#include <execinfo.h>
    +
     enum swrap_dbglvl_e {
     	SWRAP_LOG_ERROR = 0,
     	SWRAP_LOG_WARN,
    @@ -303,8 +305,8 @@ static void swrap_log(enum swrap_dbglvl_e dbglvl,
     		switch (dbglvl) {
     			case SWRAP_LOG_ERROR:
     				fprintf(stderr,
    -					"SWRAP_ERROR(%d) - %s: %s\n",
    -					(int)getpid(), func, buffer);
    +					"SWRAP_ERROR(ppid=%d,pid=%d) - %s: %s\n",
    +					(int)getppid(), (int)getpid(), func, buffer);
     				break;
     			case SWRAP_LOG_WARN:
     				fprintf(stderr,
    @@ -565,10 +567,35 @@ static int libc_bind(int sockfd,
     	return swrap.fns.libc_bind(sockfd, addr, addrlen);
     }
     
    +#define BACKTRACE_STACK_SIZE 64
     static int libc_close(int fd)
     {
     	swrap_load_lib_function(SWRAP_LIBC, close);
     
    +	if (fd == 21) {
    +		void *backtrace_stack[BACKTRACE_STACK_SIZE];
    +		size_t backtrace_size;
    +		char **backtrace_strings;
    +
    +		SWRAP_LOG(SWRAP_LOG_ERROR, "fd=%d", fd);
    +
    +		backtrace_size = backtrace(backtrace_stack,BACKTRACE_STACK_SIZE);
    +		backtrace_strings = backtrace_symbols(backtrace_stack, backtrace_size);
    +
    +		SWRAP_LOG(SWRAP_LOG_ERROR,
    +			  "BACKTRACE %lu stackframes",
    +			  (unsigned long)backtrace_size);
    +
    +		if (backtrace_strings) {
    +			size_t i;
    +
    +			for (i = 0; i < backtrace_size; i++) {
    +				SWRAP_LOG(SWRAP_LOG_ERROR,
    +					" #%lu %s", i, backtrace_strings[i]);
    +			}
    +		}
    +	}
    +
     	return swrap.fns.libc_close(fd);
     }
     
    @@ -704,6 +731,8 @@ static int libc_vopen(const char *pathname, int flags, va_list ap)
     
     	fd = swrap.fns.libc_open(pathname, flags, (mode_t)mode);
     
    +	SWRAP_LOG(SWRAP_LOG_ERROR, "path=%s, fd=%d", pathname, fd);
    +
     	return fd;
     }
     
    

    We found out that the code responsible for this created a pipe() to communitcate with the child and then forked. The child called close() on the second pipe file descriptor. So when another fork happend in the child, the close() on the pipe file descriptor was called again and we closed a fd of the process to a tdb, connection or something like that. So initializing the pipe fd array with -1 and only calling close() if we have a file description which is not -1, fixed the problem.

    If you need a better stacktrace you should use libunwind. However socket_wrapper can be a nice little helper to find bugs with file descriptors ;)

    BUG: Samba standard process model closes random files when forking more than once

    flattr this!

    March 26, 2015 01:22 PM

    March 23, 2015

    Andreas

    Android 5 on the Samsung Galaxy Nexus

    Another milestone, I got CyanogenMod 12.0 (Android 5.0.1) nearly fully working on the Samsung Galaxy Alpha (SLTE) Exynos version. Video playback is not working but I’m sure it will just be a matter of time …

    android_5_slte

    The source code is available are here.

    flattr this!

    March 23, 2015 09:40 PM

    Last updated: July 28, 2015 02:13 PM

    Donations


    Nowadays, the Samba Team needs a dollar instead of pizza ;-)

    Beyond Samba

    Releases