Review: Roam Mobility in Las Vegas with a Nexus 5

I recently returned from a five day trip to Las Vegas, to once again play the low-limit blackjack at Hooters Casino Hotel, enjoy the complimentary drinks and see a few shows. I’ve done this before with friends, but the first major change is that this is the first year I’ve had cell coverage in the US thanks to Roam Mobility. I’d used them on a conference trip to San Francisco earlier this year and it was quite handy.

The general principle is that you pay $4/day for unlimited talk/text (including voice/SMS back to Canadian numbers), and also get a 300MB allotment of 4G/LTE data per day of the plan. Thus, if you buy three days you get 900MB to use at any time during the total plan. If you go through the allotment, it degrades to “unlimited” data at EDGE/128kbps speeds.

To get started, you have to buy a Roam SIM card ($20; comes in regular+micro combo pack or nano form factor) plus a $2 “LTE upgrade” fee as a one-time purchase. The SIM stays active for up to 365 days between plan purchases, so it’s worthwhile if you plan to use the service in the States at least once a year or can loan the SIM to someone who will use it. You also need an unlocked phone, and preferably one that supports the 1700MHz and 2100MHz bands used by T-Mobile US (Roam acts as a MVNO on T-Mobile service.)

I wasn’t entirely satisfied with how Roam Mobility worked with my Nexus 5 device during the trip. Some of the issues I encountered are specific to the Android device and some seem to be network/service related on Roam/T-Mobile US’ part. Here are a few caveats to consider.

Scheduling: Plan Early

Roam’s website lets you schedule plans in advance to start at a specific 15 minute time. I had scheduled a plan to start at 10:30PM Pacific in preparation for our flight landing shortly before that, and had switched my APN to ‘roam’ prior. When we landed, I was unable to get voice, data or SMS service in Terminal 3 despite having “five bars” of coverage. While power cycling the phone, I received SMSes advising that I had fewer than 30 voice minutes remaining. There was also a stuck voicemail indicator that simply redirected me to the Roam customer service line (which is closed at that time of night.)

Example messages I received while waiting for the Roam Mobility plan to activate. The phone is now set in Eastern time, so subtract three hours.

Example messages I received while waiting for the Roam Mobility plan to activate. In this screenshot the phone is set to Eastern time, so subtract three hours.

 

Service did not activate until 11:08PM, after we’d reached the hotel. I received another SMS from the 7850 shortcode advising that the 4-day plan was now active with 1200MB of data. Two friends of mine who were using Roam as well had no such difficulties – they had bought their plans briefly before boarding the plan and started service immediately. Their service became active on first power-up in the airplane while waiting to arrive at the gate in Vegas. So in short, if you want your Roam plan to be active right as you land, be generous with the start time.

From perusing the Roam support site after the fact, if you run into a similar problem, you may want to try messaging ‘start’ to *7850 to see if this kicks off the account provisioning process right away. I suspect this is what customer service would have told me to try if I called within business hours, but there wasn’t an easy way to get this information. I’d suggest that Roam add this trick as an message to their IVR in the ‘activate service’ option.

LTE Intermittent on Nexus 5

The coverage near the hotel and on the south end of the Strip (specific locations I checked included MGM Grand, Excalibur, and as far up as Bellagio) was very good, consistently displaying 4 to 5 bars of LTE. It also received and delivered Hangouts and BBM messages promptly. Coverage was also good near the north end by the Neon Museum and the Fremont Street Experience area.

However, when we headed slightly off-Strip, further down E. Tropicana to the Pinball Hall of Fame, my phone lost connectivity with no data symbol near the network indicator. The workaround was to back the network type down to 3G/HSPA. On the Nexus 5 this option is under Settings > (Wireless & Networks) > More… > Mobile networks > Preferred network type. Immediately after I forced 3G coverage, data began flowing again.

My travelling companions also didn’t suffer this issue: one was using an unlocked iPhone 5S and didn’t do anything special with his network settings throughout the trip, and the other had a Nexus 4 without LTE capabilities. For me, this problem took the advantage away from the $2 LTE upgrade to the SIM. I had to keep the phone in 3G only mode for the rest of the trip to ensure I could get coverage everywhere.

No Mobile Hotspot on Nexus 5

Due to some apparent shenanigans with Android 4.4, the phone tries to use a different APN when in tethering/mobile hotspot mode. This is documented on the Roam Mobility site as an issue specific to the Nexus 5, but I believe it could affect all stock Android devices. A fix is promised but the issue has definitely not yet been resolved in September 2014. Hooters has (slightly pokey) WiFi in room for hotel guests, and that was good enough to get my laptop online for flight check-in and researching shows.

As a note, though, it’s not just Android devices with an APN issue; once again the support site advises that iOS devices may need to get a custom configuration for hotspot use, and that BlackBerry 10 users could also encounter issues getting mobile hotspot capabilities to work.

Wrapup

Overall, compared to the data roaming gouge-fest from the usual Canadian carriers, Roam can’t be beat despite the issues I ran into. Next time, I’ll look at bringing a different or secondary phone to compare network and activation behaviours.

For a longer trip, I’d also consider a prepaid T-Mobile SIM ($10 SIM plus $30/month, 5GB LTE data.) The main issue with a T-Mo US SIM is that you’ll have to have it shipped to your destination or get one at a convenience store or dealer in the States, whereas Roam has retail presence at a variety of stores in Canada or will ship you a SIM for free.

If you’re currently with WIND Mobile, they have a $15/month US roaming option that can be purchased for 30 days and then removed from your account, which makes it a less expensive option than even buying the Roam SIM the first time. It also uses T-Mobile US towers so coverage should be the same.

Update, 2014-10-15: Looks like the Roam support team on Twitter is advising use of the ‘wholesale’ APN now rather than ‘roam’ to resolve issues with data connectivity. Something to try out if you ran into similar issues. I expect I’ll be using Roam early next year again and will report back.

WordPress file permissions and upgrades with wpfix.py

(Post updated 2015-05-07 with the results of some helpful feedback from mbrowne. Comments, GitHub issues and pull requests are always welcome!)

I maintain a Github repository of small useful scripts (at least to me) and occasionally get comments or email about them. I received an email yesterday asking about WordPress file permissions when applied with wpfix.py, which is a simple Python wrapper around a few common filesystem operations. I’d initially written about it a few years ago as a utility to allow sites to auto-update.

Since wpfix.py was written, it appears that there have been some changes in the way that WordPress performs upgrades. I’ll excerpt the issue from the original email:

I have recently ran your script on our wordpress website to fix permission issue.

But we are getting below error while we try to upgrade wordpress from admin panel.

 

“This is usually due to inconsistent file permissions.: wp-admin/includes/update-core.php”

 

When i look the permission I could see update-core.php file have only read permission for webserver user “www-data”. Is your script designed to set 644 for files in this folder ?

-rw-r--r-- 1 username www-data  47326 Aug  1 06:09 update-core.php

 

I took it upon myself to read some of the WordPress code that performs core updates, as well as some of the documentation. To answer the original question, wpfix.py does set 644 permissions on all WordPress files in the directory tree, then goes through the wp-content directory and adds group write permissions only where necessary.

The auto-update documentation at http://codex.wordpress.org/Hardening_WordPress states:

When you tell WordPress to perform an automatic update, all file operations are performed as the user that owns the files, not as the web server’s user. All files are set to 0644 and all directories are set to 0755, and writable by only the user and readable by everyone else, including the web server.

Unfortunately this doesn’t seem to match with the behavior in the code – when a direct FS_METHOD is used for manipulating files rather than through FTP or SSH, operations get performed as the web server user (www-data). Therefore, the 644 permissions on wp-admin are too restrictive to allow core upgrades.

There are a few solutions to this problem:

  • If you do not accept the risks of having the webserver (www-data) user having write access to your WordPress contents, use the wp-cli (http://wp-cli.org/) core update command running as the user that owns the WordPress files. This is my preferred method and it can be scripted to batch update sites.
  • If you completely control the webserver and can be assured that nobody will upload a potentially malicious plugin or execute code that traverses the filesystem, set the permissions to 664 for all files (not directories) under wp-admin and wp-includes directories and have the group set to www-data:

    • find $WORDPRESS_DIR/wp-admin -type f -exec chmod 664 {} \;
      find $WORDPRESS_DIR/wp-includes -type f -exec chmod 664 {} \;
      chgrp -R www-data $WORDPRESS_DIR/wp-{admin,includes}
    • I would not recommend this in a shared hosting environment. When you upgrade, the more permissive group write flag will be preserved on these files (see the WP_Filesystem function in wp-admin/includes/file.php for details on how FS_CHMOD_DIR and FS_CHMOD_FILE are set.)
  • If you have FTP or SSH access to the server, and want to upgrade using this technique, remove the define('FS_METHOD', 'direct'); line from wp-config.php. This ensures that file delete, write and move operations are performed as the FTP/SSH user.

I will be adding parameters to wpfix.py shortly to address the last two points, and allow users to either set more permissive permissions on wp-admin/wp-includes directories or remove the FS_METHOD define.

Fixing SYSVOL DFS replication on Server 2012

Huge thanks to Matt Hopton at “How Do I Computer?” for this informative article on fixing DFS replication issues with the SYSVOL directory. In my case, symptoms were similar – AD group policies weren’t being successfully updated at a remote site with its own read-only domain controller. This was present in gpresult /h output.html, where scripts that had recently been added at logon to the main office DC earlier in the day were not able to be found on the branch domain controller.

Some additional notes:

  • Look in Event Viewer under Applications and Services Logs > DFS Replication for a warning with ID 2213, which provides the wmic command needed to resume replication
  • If the DC has been out of sync too long, there will be an Error with ID 4012; use:wmic.exe /namespace:\\root\microsoftdfs path DfsrMachineConfig set MaxOfflineTimeInDays=65

    and replace 65 with a number that is above the “server has been disconnected from other partners” value. Then, rerun the wmic command from the first event. Give it a few minutes and be patient and if all goes well, another event will pop into the log indicating successful initialization of the SYSVOL folder.

 

Restoring Windows on a Lenovo X230 with WIM/SWM files

After a bit of house tidying over the past few days, I managed to locate the power adapter for my Lenovo X230 laptop. Upon booting it up I realized that it had accumulated a number of outdated applications and crufty configurations, so I wanted to restore it to factory settings. Ordinarily I would immediately image with a stock Windows ISO, but since I’d paid for a Win7 Pro license, and wanted all the Lenovo applications restored (volume/brightness OSD, battery monitor in taskbar) I specifically wanted to restore to the OEM version and then remove the trialware.

The first problem that I ran into was that since Linux has been on this machine, and I’ve swapped the internal HDD to an Intel SSD, the partition table and MBR weren’t exactly in original. As a result, the common technique of pressing the ThinkVantage button at startup (or Enter, when prompted) and choosing to restore the system (or pressing F11) wasn’t going to work; the option is simply not present. To be clear, I still had three partitions in place: primary partition 1, a 1499MB “SYSTEM_DRV”; primary partition 2, a ~208GB “Windows7_OS” partition mapped to C:\, and primary partition 3, a 13.67GB “Lenovo_Recovery” partition mapped to Q:\. The SYSTEM_DRV and Lenovo_Recovery partitions all still had their files intact.

My first attempt was to hit F8 at boot, just prior to the Windows logo appearing, and select the “Repair your computer” option. From the WinPE-style GUI, I selected the bottom option called Lenovo System Recovery from the usual list of repair options (Startup Repair, Command Prompt, System Restore, etc.) This unceremoniously bombed out with a “Recovery failed” message and the wizard would not proceed. The Internet was generally pretty useless on this point, although various guesswork posts suggested that if you didn’t have a pristine OEM partition table the process would fail.

The next option was to try and create a set of recovery disks from Windows. Of course, the X230 doesn’t actually have a DVD-RW drive, so I figured I’d have to trick it into writing to USB somehow. Running the “Create Recovery Media” application (or Q:\LenovoQDrive.exe) I was thwarted by some quasi-DRM mechanism that insisted that I’d already created a set of media and to go away. Who knows, I may have managed to get it to create disks to an external DVD writer when I first bought the machine. As a side note, you’ll want to enable viewing of hidden and system files to more easily work with the Q drive and review its contents.

The Internet was much more forthcoming on bypassing this ludicrously stupid mechanism with StackOverflow to the rescue, and apparently there’s two very similar ways to restore recovery disk creation, depending on your laptop:

  • The utility uses an NTFS alternate file stream in Q:\FactoryRecovery\Recovery.ini:DONE, set to 0 and a newline when the media set hasn’t been created and 1 and a newline when in “no recovery set for you” mode. The fix:
    echo 0 > Q:\FactoryRecovery\RECOVERY.INI:Done
  • If this doesn’t work, edit Q:\FactoryRecovery\service_done.ini, and set DONE=0.

Unfortunately this entire process was fruitless, because even when I was able to get the recovery application to start and agree to begin creating disks, I was beset by this lovely error:

create_recovery_media

 

The next thing I tried was to install a newer version of Lenovo Rescue and Recovery (4.52), which also failed during the second phase of the installer with a return code 6 error message. Once again, this seemed to be related to my partition and bootloader monkeying.

At this point I’d had enough time to inspect the contents of the Q drive, and determined that there were several key files that would recover the system partitions:

  • Q:\factoryrecovery\cdrivebackup.swm through to Q:\factoryrecovery\cdrivebackup6.swm – a multipart WIM file with the original contents of the C:\ partition
  • Q:\factoryrecovery\sdrivebackup.wim – a single part WIM file with the original contents of the S:\ partition

There are a number of tools that will read WIM and SWM files, but the most popular referenced one is imagex.exe, part of the Windows Automated Installation Kit. It’s since been replaced by DISM for Windows 8, but I managed to get things restored with imagex. In case you don’t want to download the entire 1.7GB WAIK archive, here is the 64-bit version of imagex that worked for me.

I followed approximately the same steps from this Lenovo forum post with some minor changes:

  • I created a Windows 7 Pro SP1 installation USB stick from ISO and imaged it to a USB drive with the Windows 7 USB/DVD Download tool. If you need a Windows 7 ISO, I suggest looking for Digital River links and verifying the hashes against the original versions. This isn’t piracy – these ISOs still need a product key but we won’t even get that far, since we’ll be restoring the Lenovo copy of Windows.
  • Copy the imagex.exe file to the root of the USB stick.
  • Boot to the Win7 USB stick in BIOS, then choose the Command Prompt option. Determine where all your drives are mapped and replace drive letters in the commands below to reference the appropriate drive. For me, the mapping showed up as:
    • C:\ – 1499MB SYSTEM_DRV partition
    • D:\ – 208GB Windows7_OS partition
    • E:\ – 13.67GB Lenovo_Recovery partition
    • F:\ – USB stick containing Windows 7 ISO contents and imagex.exe
    • X:\ – memory-mapped Windows PE contents
  • Run the diskpart commands from the original post from phil5 with the following changes, replicated here for posterity. Lines beginning with # are comments for readability, don’t type them:
    diskpart
    list disk
    
    # This shows a list of drives attached to your system. In my case I had two disks, disk 0 (solid state drive) and disk 1 (USB stick that I booted Windows from.)
    
    select disk 0
    
    list partition
    
    # Determine the correct partitions for SYSTEM_DRV, Windows7_OS and Lenovo_Recovery. The following operations will be destructive so make sure you have the correct partitions selected!
    
    # Deleting SYSTEM_DRV (S:\)
    
    select partition 1
    delete partition
    
    # Deleting Windows7_OS (C:\)
    
    select partition 2
    delete partition
    
    # My original partition 1 was 1499MB, not 1199MB as indicated in the original post; command has been adjusted
    
    create partition primary size=1499
    select partition 1
    active
    format fs=ntfs label="SYSTEM_DRV" quick
    assign letter=s
    
    create partition primary
    select partition 2
    format fs=ntfs label="Windows7_OS" quick
    assign letter=c
    
    exit
  • Recover the system with WIM/SWM files from the Lenovo_Recovery partition on Q:\ – the C drive is a multipart SWM archive and requires the /ref switch:
    F:\imagex.exe /apply /verify Q:\factoryrecovery\Sdrivebackup.wim 1 S:\
    F:\imagex.exe /apply /verify /ref Q:\factoryrecovery\Cdrivebackup*.swm Q:\factoryrecovery\Cdrivebackup.swm 1 C:\
    
  • Once complete, run the X:\bootrec.exe /fixmbr and X:\bootrec.exe /fixboot commands. Choose the Reboot option in the GUI, and again boot to the Windows 7 USB media. When entering, I was prompted to repair my boot process and restart – this was successful in bringing Windows back to a “Preparing to start your computer for the first time” splash screen and restoring the normal boot order.

After this process, you can run through the usual mechanisms of removing the Norton Internet Security trial and clicking OK to hilariously-worded dialog boxes like:

lenovo_system_update

Congratulations! I would suggest that you perform the following action to make your life easier in future, since I still wasn’t able to successfully create a set of official recovery media after this was all done.

Get a 16GB or larger USB stick – 32GB would be ideal. Image it with a Windows 7 Pro SP1 ISO via the USB/DVD creator tool, then copy the Q:\factoryrecovery folder and imagex.exe to it. That way you have an all-in-one restore mechanism for your ThinkPad that can be run even if the Q:\ partition is removed in a horrible accident.

For now I’m just happy to have the system running again, even if I have 769.1MB of updates pending to download and install.

Better Mario Kart 8 connectivity using pfSense

One of the more entertaining games I’ve played recently has been Mario Kart 8. Even though I’m not very good, it’s great with friends, despite what seems like Nintendo’s complete aversion to online gaming.

Since I’m used to the better mechanism of party chat on Xbox Live, typically I run Skype on a laptop throughout the session to the group of friends I’m playing with. The MacBook Pro built-in mic/speaker combination does decent enough noise cancellation, which means that multiple players in the same physical room can both spew profanity and have profanity spewn right back.

However, the configuration needed to get online play functioning properly (and staying working) is not exactly straightforward or even correct on most places on the Internet. Most troubleshooting steps, including those from Nintendo for connection error 118-0516, eventually advise that you place the WiiU console in a DMZ. I find this to be unacceptable, because:

  • (Update July 24/14: pfSense does have an option similar to a traditional DMZ, called 1:1 NAT and found under the Firewall / NAT / 1:1 configuration page. I still wouldn’t recommend it though, given that you may want to keep route certain ports to different services. Original content follows in struck out text.) pfSense doesn’t have a DMZ in the traditionally easily-configurable “send all inbound packets on any port to this NAT’d device” sense. There are ways to configure a DMZ in the more traditional network admin sense (eg: a separate network for Web servers) but there’s not an “easy button” equivalent to the Linksys/Netgear/D-Link version.
  • It might be a bit neckbeardy, but I don’t like giving full inbound access from the Internet to a device behind my firewall, despite the fact that it’s running a more restrictive OS and network stack.
  • Just ‘forwarding all ports’ to the WiiU interferes with other servers that I like to run on the same network connection. Lists of ports online elsewhere advise forwarding all UDP and a whack of (incorrect) TCP ports, but if I want to run Skype I don’t want to blindly send everything to the console.

The real key to getting past connection error 118-0516 with pfSense is to enable static port NAT for the WiiU. This setting also applies to other devices that use the Nintendo Network, such as the 3DS for Animal Crossing. Ensure you have performed basic network troubleshooting before reading further: these steps WILL NOT help if you have WiFi with packet loss or poor Internet connectivity in general.

(Seriously, don’t assume your WiFi is great because it works well for Internet browsing. Get numbers with ping and traceroute over an extended period of time, and correct the problem if it’s related to air quality or a flakey router.)

(Scroll down further if you just want to know how to change this setting.)

Rationale behind Static Port = No

Static Port is described briefly in the pfSense documentation as a security feature to avoid someone determining the device or OS behind your firewall, and is turned on by default.

The entire process works like this: when a LAN device attempts to create an outbound connection, there is both a source and destination port associated with that request. So your computer might request a website on destination port 80, but that request is “coming from”, or “sourced” from a port above 1024 picked by your operating system – something like 57894. You can see these ports by running netstat -n from your computer’s command prompt or terminal, and noting how local addresses with high ports have established connections to foreign addresses with standard ports like 443 and 80.

pfSense, thinking it’s doing us a favour, sees that outbound request and picks a “more random” source port on the WAN side to avoid exposing the internal source port. It keeps track of the LAN source port to WAN source port mapping in a state table. An more detailed example of this transaction might be:

Computer at 192.168.1.100 wants to get a website at http://example.com/ (port 80).
The OS on 192.168.1.100 starts a request from port 57894:

192.168.1.100:57894 > example.com:80

pfSense has to keep track of this request and send it out of the WAN connection. 
It picks a new random port (eg: 32564) when it gets to the Internet connection.

192.168.1.100:57894 > modem.bigisp.com:32564 > example.com:80

pfSense now knows the state:
port 57894 internal == port 32564 external

Then when the request comes back from example.com, pfSense reverses the mapping: 
example.com has a source port of 80 and sends data back to port 32564:

example.com:80 > modem.bigisp.com:32564 > (pfSense state) > 192.168.1.100:57894

To avoid going much further into this, WiiU and other Nintendo Network devices don’t like how pfSense does the source port translation, and as a result you will see connection errors when trying to establish a session with other players.

Set a Static IP or Static DHCP Lease

Since we will be configuring firewall rules for a single, specific device and don’t want another device to accidentally take over this IP, the WiiU should have a static IP set, or a static DHCP lease assigned in pfSense. To set a static DHCP lease, access Status > DHCP Leases and locate the WiiU console in the list. Click the ( + ) button next to the device and provide an IP address outside of the usual range – for example, if I had a DHCP range of 192.168.1.100 to 192.168.1.254, I might make the WiiU IP address 192.168.1.50. It may be useful to note in the description that this static lease is for the WiiU console.

Save the configuration and restart the DHCP server when pfSense prompts, then power cycle the WiiU.

Changing the Static Port Setting

From the top menu, access the Firewall > NAT option, then select the Outbound tab. The first two options are:

  • Automatic outbound NAT rule generation (IPsec passthrough included) – the default
  • Manual Outbound NAT rule generation (AON – Advanced Outbound NAT)

You will need to change this mode to “Manual Outbound NAT rule generation”, if not already present, and save. Once saved, some rules should be automatically generated for LAN to WAN traffic as well as ‘localhost to WAN’.

Leave these rules alone and add a new rule by using the ( + ) button at the top of the list. Set the following properties:

Do not NAT: unchecked
Interface: WAN
Protocol: any
Source:

  • Type: Network
  • Address: The address of the WiiU (eg: 192.168.1.50), with a “/32” in the dropdown box
  • Source port: (leave blank)

Destination:

  • Type: any
  • Address: (leave blank, should be disabled)
  • Destination port: (leave blank)

Translation:

  • Address: interface address
  • Port: (leave blank)
  • Static port: check this box

No XMLRPC Sync: Unchecked, only useful in a multiple pfSense environment
Description: I provided “WiiU AON, static port

Advanced Outbound NAT for WiiU

When complete, ensure the rule is at the top of the list, then click Apply Changes. Your screen should look like the following image (although not necessarily including the OpenVPN rule):

pfSense AON List

After this, exit and re-enter Mario Kart 8 or your other Nintendo Network software. You should be able to join games and participate in online multiplayer sessions.

Wait, it’s still not working!

Still having troubles getting into games, or having other people join yours? You may need to perform some port forwarding operations. While there are suggestions to forward a number of TCP and UDP ports, I’ve run a packet capture during several multiplayer sessions with the following notes:

  • No TCP traffic was initiated or received by the WiiU across >30 minutes of successful Mario Kart 8 game sessions. This means that any port forwarding techniques involving TCP are placebos at best.
  • The WiiU does not attempt to establish a UPnP or NAT-PMP session on the router.
  • UDP ports are selected somewhat randomly from the list of non-privileged ports (>1024). In the sample session, the lowest port I saw was 9103 and the highest port was 61320.

Given these details, you could forward UDP ports 1025-65535 to the WiiU IP address in Firewall > NAT > Port Forward, but I would suggest limiting this range even further to UDP 49152-65535 (the dynamic ports as specified by IANA). An example screenshot with this configuration is provided for your convenience.

WiiU Port Forwarding

What’s next?

I intend to continue to run packet captures during Mario Kart 8 sessions with this configuration to collect more data, as well as review the pfSense firewall logs during any disconnections to see if any traffic is being explicitly blocked. I didn’t capture entire content, but to replicate the packet capture:

  • Enable SSH on the pfSense box
  • SSH in as root, and select option 8 (Shell) from the menu
  • tcpdump -i em1 -w /tmp/wiiu.pcap ‘src 192.168.1.50 or dst 192.168.1.50’  – where em1 is your LAN interface, and 192.168.1.50 is the IP address of the WiiU
  • Start game session, Ctrl+C once complete and SCP the .pcap to a different machine for analysis with Wireshark or other tools

Fix forwarding to Gmail with a Linode Postfix/Dovecot mail server

Recently I decided to rebuild our main Debian Squeeze host as a 64-bit Debian Wheezy (7.0; I believe the template is 7.3 as of the time of this writing) VPS. This box runs web hosting, email, internal IRC, shell access and basically any other services that one of our beloved sudoers would like to try. Both of these hosts live in Linode’s Newark datacenter.

Linode will pro-rate your account if you cancel a server in the middle of a month, so both instances (old and new) are currently running for a minimal net cost. You can also assign private IPs to each host and SCP data or mount NFS between servers without cutting into your bandwidth quota, as well as attain a slight transfer speed improvement. This gives us plenty of time to move finicky services and make sure that the new configuration is working as intended. We’ve been cutting over individual user websites and mail services one by one to lessen the impact.

One of the problems I ran into that did not exist on the old host was email forwarding. We use the Linode Library: Email with Postfix, Dovecot, and MySQL article as a basis for a mail server that supports multiple domains, IMAP mailboxes and aliases that forward to multiple accounts. Most of our traffic is forwarding operations, typically to multiple users at once as sort of a poor-man’s distribution list. The main forwarder is an internal alias for Slightly Sauced group discussions, which sends messages out to everyone’s preferred mail provider.

Initially there didn’t seem to be any problems with the mail setup. I use Exchange Online for my personal email and messages were coming and going properly to me. When I went to reply to an existing thread, I soon got a bounceback message from for only a few email recipients the list. Checking /var/log/mail.log, I found the following lines (truncated for brevity and sanitized to not mention any specific email addresses):


Feb 26 21:58:24 services02 postfix/qmgr[28676]: 2F6FA6AB68: from=<me@example.com>, size=31176, nrcpt=9 (queue active)
Feb 26 21:58:24 services02 postfix/smtp[10250]: 2F6FA6AB68: to=<recipient1@googleapps.example.com>, orig_to=<distribution@example.com>, relay=ASPMX.L.GOOGLE.com[2607:f8b0:400d:c04::1b]:25, delay=8.6, delays=8/0.01/0.13/0.46, dsn=2.0.0, status=sent (250 2.0.0 OK 1393451904 g88si838453qgf.126 - gsmtp)
Feb 26 21:58:24 services02 postfix/smtp[10253]: 2F6FA6AB68: to=<recipient2@dreamhost.example.com>, orig_to=<distribution@example.com>, relay=mx1.sub4.homie.mail.dreamhost.com[208.97.132.226]:25, delay=8.7, delays=8/0.02/0.24/0.39, dsn=2.0.0, status=sent (250 2.0.0 Ok: queued as 84D4B76807B)
Feb 26 21:58:24 services02 postfix/smtp[10251]: 2F6FA6AB68: to=<recipient3@gmail.example.com>, orig_to=<distribution@example.com>, relay=gmail-smtp-in.l.google.com[2607:f8b0:400d:c00::1a]:25, delay=8.7, delays=8/0.01/0.06/0.58, dsn=5.7.1, status=bounced (host gmail-smtp-in.l.google.com[2607:f8b0:400d:c00::1a] said: 550-5.7.1 [2600:3c03::f03c:91ff:fe6e:423f 12] Our system has detected that 550-5.7.1 this message is likely unsolicited mail. To reduce the amount of spam 550-5.7.1 sent to Gmail, this message has been blocked. Please visit 550-5.7.1 http://support.google.com/mail/bin/answer.py?hl=en&answer=188131 for 550 5.7.1 more information. r6si610718qcl.69 - gsmtp (in reply to end of DATA command))
Feb 26 21:58:26 services02 postfix/smtp[10252]: 2F6FA6AB68: to=<recipient4@exchange.example.com>, orig_to=<distribution@example.com>, relay=example-com.mail.protection.outlook.com[207.46.163.215]:25, delay=11, delays=8/0.02/0.34/2.3, dsn=2.6.0, status=sent (250 2.6.0 <72c119e2692a422cbc733234ced8599a@SN2PR03MB046.namprd03.prod.outlook.com> [InternalId=49989124371267, Hostname=BY2PR03MB041.namprd03.prod.outlook.com] Queued mail for delivery)
Feb 26 21:58:26 services02 postfix/bounce[10256]: 2F6FA6AB68: sender non-delivery notification: D6C4C6AB90

The first thing I noticed is that the users on Dreamhost IMAP, Exchange Online, and Google Apps / Google Hosted accounts did not have the forwarded message rejected. Forwards going to straight @gmail.com addresses were immediately rejected. I reviewed the Google Support document without much luck, and then stumbled across some documentation from Tanguy Ortolo about Google’s IPv6-related email restrictions. While Tanguy’s workaround was a good one (force IPv4 connections to Google mail servers), I tried to solve the problem while maintaining IPv6 connectivity. Linode does let you set reverse DNS (PTR) records for both IPv4 and IPv6 addresses, as long as they forward-resolve correctly.

  • In the Linode control panel or your own DNS management system, establish A and AAAA records for the server that is transmitting email. (eg: mailserver01.example.com.) Wait for the records to be resolvable and check with the dig a $hostname; dig aaaa $hostname commands on an IPv6 enabled system.
  • In the Linode control panel, find the individual VPS and access Remote Access. In the Public IPs section, click Reverse DNS.
  • On the Reverse DNS page, look up the domain name (mailserver01.example.com) with the provided tool. If your A and AAAA records are present and functional, Linode will ask whether you would like to use mailserver01.example.com as the reverse DNS entry for both your IPv4 and IPv6 addresses. Click Yes to both options.

Within 24 hours, forwarded mail flow to Gmail accounts should begin working properly.

I also took this opportunity to update my SPF records as Exchange Online was fairly restrictive about certain types of messages that I sent through this server. I had to update my SPF record to: v=spf1 a mx include:spf.protection.outlook.com include:example.com ~all, where example.com was the domain of our Debian mail server that had its own SPF record.

Update, June 29/14: You also want to ensure that your SPF record doesn’t have more than ten total DNS lookups, recursively including all ‘include:’ directives. Exchange Online adds quite a few. Use http://www.kitterman.com/spf/validate.html to confirm that the SPF entry passes with the pyspf checker.

Fixing stuttery or frozen USB on a Supermicro motherboard

For future reference and edification: on certain Supermicro motherboards running OSes of Windows 7 or newer (including the Server variants like 2008 R2 or 2012), USB keyboard and mouse devices will act ‘stuttery’ or freeze during input. The issue I encountered was specifically missing keystrokes or repeated letter presses, and appeared to be present even before the OS had booted.

A response in the Spiceworks community pointed me to the Supermicro FAQ, which states:

Question
In X9DRW-iF, USB Mouse can’t work under Winodws 7.

Answer
Please change “ISOC” setting from disable to enable in the BIOS menu for problem solving. (This item locates at Advanced / Chipset / NB / QPI Config –> ISOC)

To clarify, reboot the server and get into BIOS by pressing Delete at startup when prompted. “NB” refers to North Bridge.

Home networking overkill with a Lanner FW-7540

I’ve recently run into a few issues with my home networking setup. In pure overkill fashion, I’ve bought some new hardware to deal with it all and hopefully, in the process, learn a bit more about different network configurations.

One of my main problems at this point is related to location. After buying a house last year, I still have yet to make significant progress on the “Ethernet to every room” project. Wireless is great and has drastically improved since the early gear, but even the 802.11ac standard and equipment is no substitution for the reliability and consistent speed of a gigabit wired line. ac routers right now can push 180Mbps throughput at 1 meter, but quickly diminish based on additional distance, other devices and the wireless adapters involved in the whole fiasco.

For the wired setup, I have all of the means to complete the process – or at least think I do until moving to whatever the next phase of the process is. At that point there’s usually much cursing, an order or two to Monoprice, and even a trip to Home Depot. Over the year I’ve relocated my folding table of tech gear to the basement, and there’s already quite a convenient hole in the floor to run some wiring through. As a result, my main tech closet in the basement all runs Ethernet, and I’m less inclined to start sawing drywall and drilling holes to the second floor on a whim.

Another problem I was seeing was poor wireless and routing performance in general. I’ve had the Netgear WNDR3700 in place for about two years now, and it’s run both stock firmware and DD-WRT with various success. I’d highly recommend the router with stock firmware for most home configurations, but DD-WRT seems to occasionally stop sending and receiving traffic on the 5GHz wireless interface.

With a router replacement, there are three main components to be aware of:

  • Router/NAT device, to handle Internet connection traffic and route it to the corresponding internal client
  • Switching equipment – usually built in to the router, but additional capacity is generally needed down the line for more than four systems or avoiding lengthy cables
  • Wireless radio interface – again, usually built into the router

I decided to split this up a bit into its logical components. For the router/NAT device, my friend Matt sold me on a Lanner FW-7540, which is essentially a small-form-factor box with four Intel gigabit Ethernet ports and a dual-core Intel Atom CPU. The machine easily runs software like pfSense, which is a FreeBSD distribution with a Web interface and some configuration utilities on top. It’s incredible software and very powerful.

For switching equipment, I turned off DHCP on the Netgear router and am not using the WAN (Internet) port, turning it into a wireless access point plus four-port gigabit switch. I believe there is an option to reassign the WAN port to a LAN port, but I am not entirely lacking for ports near the cable modem at this point. Other locations in the house utilize 8-port Monoprice gigabit switches and that’s probably what I’d put in if the Netgear died or started acting up.

The last part of the equation is wireless access, and I’m waiting for the Ubiquiti UniFi AP AC to become reasonably commercially available. For now, I’m expecting a UniFi AP Pro to start. Even in a residential neighbourhood, I typically see upwards of a dozen networks in range and would like a more powerful, better-located access point to serve the systems here.

So, what have I learned about this setup?

Serial access to the Lanner console is a bit of a fun time. The device includes an RJ-45 to DB-9 serial adapter, so I had to hunt for which devices around the house had a serial port. You’ll also want to have a basic understanding of how serial terminals work.

Installing pfSense – when picking the kernel, select the option that is not symmetric multiprocessing, or you’ll lose console access on the first boot. Initial configuration for making the device behave like a usual router/switch involves not only setting up “OPT1” and “OPT2” interfaces to be bridged to the LAN, but configuring the built-in firewall to allow all traffic between them. I accidentally set the firewall allow rules to only let TCP traffic pass between the network interfaces, and that basically ruined functionality for anything plugged into ports 3 and 4 on the Lanner.

IP range selection is a good thing to plan out completely, especially if you’re a moron and pick the same range that your office uses to assign to VPN clients and a number of internal systems. Stick to low-numbered 192.168.x.y subnets to interfere with the least amount of connectivity, and select the appropriate netmask. I picked 10.0.0.0/8 and was in a world of hurt reconfiguring the network the next time I had to work from home.

Don’t dual DHCP or you’ll end up with what looks like periodic packet loss. Running a continuous ping to the router showed maybe two “Request timed out” results every twenty minutes or so. This interrupted music mounted from another computer as well as the Internet connection. Make sure all other DHCP servers are turned off or locked down appropriately!

(Messages in the pfSense logs for this condition look like repeated instances of the following block)

Apr 15 01:18:02 pfsense kernel: arp: 192.168.1.100 moved from 00:1b:21:b0:7e:bb
to 34:bb:1f:bb:0a:f8 on em1
Apr 15 01:18:15 pfsense kernel: arp: 192.168.1.100 moved from 34:bb:1f:bb:0a:f8
to 00:1b:21:b0:7e:bb on em1

Update 1: Useful sites that helped sort this out were:

And finally, have a UPS on all critical parts of the network path. They’re reasonably inexpensive and it’s nice to be able to still have Internet access during a power outage situation.

Fix issues signing in and updating apps from the Mac App Store

Problem: The Mac App Store on my laptop refused to allow me to update existing applications, download new ones, sign in to my account or view existing downloads. Trying the “Store > Sign In” and “Sign In” link from the Featured page both refused to display the usual login dialog. Attempting to update existing applications showed the usual “spinner” in the top toolbar with no progress.

Dead ends: Suggested on the Apple Discussion forums, there were several items suggesting anti-virus and firewall involvement. None of these were applicable to my situation and I was attempting all of these actions from an unrestricted TekSavvy cable connection.

Solution: This post on the Apple Discussion forums provided the initial help, but was incomplete in its solution. First, close out the App Store, then enable the debug menu by running

defaults write com.apple.appstore ShowDebugMenu -bool true

from the Terminal. Launch the App Store again, and choose Debug > Clear Cookies and Debug > Reset Application. Quit and relaunch the App Store, and you should be able to sign in and download updates successfully.

WordPress phpass generator: resetting or creating a new admin user

Again, in case I forget: If you’d like to reset a WordPress password from the database or create a new administrative user:

  1. Generate a PHPass hash using this mainframe8 tool.
  2. Insert a new row, or update an existing row, in the wp_users table. Use the hash from the tool in the user_pass column.
  3. If you’re adding a new administrator, insert the following values into wp_usermeta and replace user_id (2 in this example) with the newly created account’s ID:
    INSERT INTO wp_usermeta (`umeta_id` , `user_id` , `meta_key` , `meta_value`) VALUES
    (NULL , '2', 'wp_capabilities', 'a:1:{s:13:"administrator";b:1;}'),
    (NULL , '2', 'wp_user_level', '10');
  4. Enjoy a fixed WordPress admin account.