Get your website working in Mainland China – China Internet 101

 

In my 8+ years living and operating online businesses in Asia, I’ve learned a few hard lessons. Here’s a big one: China is different. When it comes to delivering a website, app, or any sort of internet bound service China requires far more planning and investment than any other market in the world today.

The internet is global by design. For most organizations, the infrastructure used to deliver a website in France is the same infrastructure used to deliver a website in Germany.  China however, has its own version of the internet. If your website, app, or business serves a mainland Chinese audience you need to understand the  differences between the internet you are used to v.s. the internet ecosystem in the People’s Republic of China. Here are 5  differences technology professionals need to plan for.

Note: When “China” is mentioned below I am referring to Mainland China. This excludes the autonomous regions of Hong Kong and Macau. It also excludes Taiwan which is its own nation entirely. (but don’t tell China I said that.)

 

China Great Firewall

1 – The Great Firewall of China (GFoC)

The most famous of differences is “Project Golden Shield”, or what is colloquially known as the “Great Firewall of China” (GFoC).

The GFoC is China’s censorship apparatus. Its goal is to filter and block content and services from reaching the Chinese mainland that the government has deemed against its natural interests. Some examples being:

  • Foreign (non China) media websites such as the New York Times, CNN, The Guardian.
  • Foreign Social media platforms such as Twitter, Facebook or SnapChat.
  • Foreign messaging apps such as Skype, Facebook messenger, WhatsApp.
  • Gambling websites of any kind
  • Pornography websites of any kind
  • Wikipedia
  • Websites disparaging or satirizing government figures or sowing public discontent / unrest. This includes content related to the Dalai Lama, or Falun Gong.
  • In addition to the criteria above, any website which the censors employed by “Project Golden Shield” deem to be offensive.

Wikipedia has a more detailed list, of sites which are actively blocked, but it is far from exhaustive.

Although often described as a discrete network service, the GFoC is really a collection of technologies deployed by Chinese hosting providers, Major Chinese Tech companies, Telecom providers and the Government itself. Through these technologies the GFoC can:

  • block a website URL via DNS poisoning
  • Deny traffic to and from a port, IP address, or IP address range via TCP resets or simply dropping the traffic
  • Filter content by keywords in URL’s
  • Block or interrupt services to VPN’s
  • Intercept and block or monitor unencrypted communications and even some encrypted information via “Man in the middle” attacks

Interestingly GFoC blocking is not always black and white. In my experience working with various businesses in Asia Pacific – it’s common for some sites to be available during one time, and unavailable during others.  It’s also possible to be blocked in one region, and not another. Many online businesses make the mistake of running a few tests from mainland China and assuming if it works once, it works always. Nothing could be further from the truth.

It is also important to note that the GFoC’s methods of blocking are not obvious. At no time is a user in China presented with a dialog or message that a site has been blocked. It simply does not work. This is a very important distinction because to a potential user in Mainland China, a site or app simply fails to load. The perception will likely be that the site or service is broken, not that the GFoC has stopped it from working. In other words, potential users will blame you, not their Government or ISP.

To avoid being blocked by the GFoC ensure these basic precautions are taken:

  • Do not offer ANY content or service which the GFoC would find offensive. It may work temporarily, but if you gain any sort of audience or sizable traffic within China – they will find it, and it will be blocked! This is not an “if”, but a “when”.
  • Even if your site or product is absent of content the Chinese government finds offensive, certain words or content types can raise additional scrutiny. Any media, social media, or video content will be watched with extreme caution and possibly banned without warning.
  • Encrypt your entire site using TLS whenever possible. This reduces the chances of false positives, and typically speeds up most sites as encrypted content cannot undergo proxy filtering or deep packet inspection GFoC perimeters employ.
  • Avoid domains and URL’s which contain “unsafe” words related to banned content. This could include video, media or even seemingly harmless Chinese in-jokes like whinny the poo.
  • Apply for an ICP license. This is really the only way to ensure your content is allowed to be served within China. Technically ANY SITE without an ICP license can be banned at any time, without warning or explanation.

China ICP

2 – Hosting + ICP Licensing

Hosting content within China remains the best option from a performance perspective. Having web servers + cloud services running within the country will skip many of the GFoC and networking bottlenecks,  ensuring your content is fast and available to Chinese users. Cloud services like Aliyun have made building complex web applications within Mainland China as easy as AWS. (Note: AWS  also have a China presence, operated by a local partner)

If your website or product allows, a separate China instance often solves many problems. Just be aware of the trade-offs. Hosting in China may mean that site is slower everywhere else.

Some businesses cannot put webservers or sensitive information in China for security or compliance reasons. In this case hosting in nearby Hong Kong, Taiwan or Japan can provide decent latency, but beware the networking challenges explained in the next section.

Regardless of where you host your content from, if you want it to stay accessible for Chinese users, you must have an “Internet Content Provider” (ICP) license. These licenses are issued regionally, and are typically granted by hosting providers. Technically, any website or domain which does not have an active ICP license can be blocked at any time. You must either be a Chinese Citizen to apply for a personal license, or a Chinese business to apply for a corporate license. Both require strict “Know Your Customer” processes and in-person verification. This means Foreign companies or individuals which wish to secure ICP licenses must work with a Chinese partner.

Websites are required to publish their ICP Number on the footer of their site. The license contains the region it was issued and a unique number tied to that domain.  Baidu’s ICP looks like this: “京ICP证030173号”. An ICP License applies to a root domain. – in our example, all sub-domains of Baidu are covered by this license.

Remember, just because you have secured a license it does not make you exempt from the content rules above. ICP’s are often rescinded at the first sign of trouble, usually accompanied by a swift block from the GFoC.

China Network Transit

3 – Network Transit and Peering

Of all the technical differences between the Global and Chinese Internet, I find Network Transit  the most interesting.

The internet is functionally a gigantic collection of separate networks, held together by common standards and protocols. Commercially, these large networks either pay to access one another, or they “peer” and swap traffic for free.

Mainland China’s Internet backbone is handled by three state-owned ISP’s. China Telecom (North), China Unicom (South) and China Mobile. (nationwide). Peering points for these networks are extremely congested – often resulting in massive slowdowns to internet services in one region or the next. I’ve personally seen more than one businesses operate in China for years, confident with their revenue growth in the South, not knowing (or not believing) they are non-functional in the north.

For Global ISP’s, “transit” traffic usually gets more expensive with distance. A 100mbps dedicated network connection between New York and Connecticut would be far cheaper than a 100mbps dedicated connection between New York and Dubai. Sure enough in China, the exact opposite is true. Traffic could be $8 per 1mbps of bandwidth between Guanzhou and Europe, and $80 per mbps between Guanzhou and Hong Kong, which are neighboring cities.

The reasons for these inefficiencies are two-fold: One, congested peering and expensive bandwidth creates great commercial opportunities for Chinese ISP’s. Dedicated Intra-China bandwidth between Tier 1 ISP’s can go for over $120 per mbps! That’s a lot of gravy.

Two, these inefficiencies throw up big cost barriers to foreign companies and have the added effect of slowing down foreign-hosted websites. Say for instance a Chinese user requests a web page hosted in nearby Japan. If the hosting provider doesn’t pay Chinese Telecom companies for transit, the traffic will likely flow via California or Russia, or some other distant place where transit is cheaper. Between the inefficient network paths , the delays added by the GFoC and congested peering its easy to see how most foreign-hosted websites are slowed to a crawl.

China-focused CDN’s such as Chinacache solve many of these Networking issues as they  cache content all around the country and have existing transit + peering agreement with Chinese ISP’s. CDN’s are also privy to ICP rules, so you will need a license before you can serve any content from them.

China DNS

4 – DNS

DNS in China is also unique for a several reasons.

1 – It is an integral part of the GFoC, and hence tightly controlled and centralized.

2 – All Chinese-hosted DNS is Unicast. There are no Anycast DNS services.

3-  Global DNS Providers rarely have DNS Edge nodes in China, and when they do they are on a separate network which requires additional cost.

This results in foreign sites being put at  a disadvantage, yet again.  DNS lookups need to traverse the GFoC, and with the combination of network filtering, congestion and the generally flakiness of the UDP protocol, packets are often slow in returning or go missing all together.

The best solution is to use a China-based DNS provider like DNSPod or Alibaba DNS. This will keep your DNS records hosted within China, resulting in lower latency and higher availability. The trade-off is that even the more expensive paid versions of these DNS Services have poor international DNS coverage. This is another reason why I suggest a totally separate Chinese site if possible.

Javascript IDE

5 – Software Design

Naturally, the challenges listed above cannot be overcome by infrastructure alone. Many optimizations need to be baked into the software itself. Including:

  • Reducing Site + Asset Size: The less data you move over congested Chinese ISP’s the better
  • Removing dependencies on widgets and tools from foreign web companies: Common services like Google Fonts, Google Tag Manager or Facebook’s tracking pixel are blocked in China and won’t be functional. Same goes for widgets like Twitter, YouTube or Soundcloud embeds.
  • Moving as many assets to China-based CDN’s as possible: The more content you can put on edge nodes around the country the better.
  • Ensuring Compatibility with common Chinese browsers:  Browser market share in China is quite different from other countries. Local browsers like UC Browser, Sogou Explorer and QQ are all fairly common on both desktop and mobile. Although they are WebKit based, some may introduce compatibility issues with your website or applications.
  • Ensuring compatibility with local Phones. China has a variety of home-grown phone brands which rarely seen outside of the country. Brands like Meizu, Vivo, Oppo, Xiaomi and Huawei are all very. If you develop a complex mobile phone app or website – these devices will require extra customization.
China Provinces
Think of China as 34 markets, not one.

 

Conclusion

Do not think of China as a single market. It is a gigantic country. The 4th largest in the world by landmass and the 1st in the world by population with a staggering 1.4 billion people. It is divided and governed by 34 independent provincial bodies. It operates at a size and scale that is difficult for Western nations to comprehend. (Have you ever heard of Tianjin? Probably not… but it is a Chinese city with 3x the population of Finland.)

China truly has its own unique flavor of the internet. Its Government policies have created a cyberspace which favors home-grown players to foreign competition. Far too many internet companies have looked to China for growth, rushed in without careful planning and failed miserably. Competing successfully in the Chinese market requires knowledge and investment into both product and technical localization. With the right planning and tools, you can scale the great firewall and deliver the same experience to Chinese customers that other users enjoy worldwide.

If you plan to expand your online business into China, make sure you work with a team who has the experience and knowledge of the unique challenges in this market . My consultancy business Takehan Technologies specializes in optimizing online products for China and other emerging markets. Contact us if you’d like to learn more.

ZipMatch at AWS Summit Manila

I had the pleasure of presenting ZipMatch’s technology on April 20th at the AWS Summit Manila. You can check out on the Summit On Demand page, or better yet, right here. 🙂

Since the notes on the slideshare were just talking points, I thought I’d post a more complete write-up here. Enjoy!

 

Intro

Good morning everyone. Let me begin by saying its great to be here today. My name is Kyle, and I’m CTO at a real estate marketplace based here in the Philippines called ZipMatch.ZipMatch is about 4 ½  years old right now, and we’ve been running on AWS for about 4 of those years.Today, I’ll be sharing our journey with AWS. Why we chose it, how our business has grown with it and what’s next for the future.

Who is ZipMatch?

ZipMatch is real estate marketplace based here in the Philippines. We focus on data and transparency, which enables renters and buyers to compare properties against each other in greater detail than classified sites.Using our unique algorithms, we can score and rank properties based on factors such as neighborhood, amenities, affordability, rental yield, price per square meter and many other factors.We’re backed by a great group of Silicon Valley and Regional investors which include 500 Startups, Monk’s Hill Ventures and Kickstart.

2013 – Our move to the cloud

When I joined ZipMatch in 2013 we were a small team, and our website was hosted on a single server in a datacenter in Makati. The server worked fine, but simple changes for security, networking, or storage would take days or sometimes weeks to be handled. Each change required multiple phone calls, e-mails and usually paper contracts to be signed. Not exactly how a tech-startup prefers to do business.After researching a few solutions, we agreed to setup a proof of concept on AWS.None of us in the team had any experience with AWS at the time, but our reading told us a lot of the systems were based on existing standards, so we agreed to run a proof of concept.

Things went quite smoothly. Within a week we had a fully functional copy of ZipMatch running in the Japanese AWS region. So we scheduled a time to move production to the cloud.

Then we realized, latency to Singapore would be faster for most of our users. We had deployed to the wrong country!

After some reading, we realized the issue was trivial. We followed a guide to take a snapshot of our working EC2 instance, redeploy it in Singapore. This all got done in about an hour. This made for some very happy geeks.

This was probably the first in a long list of wins where AWS and Cloud Infrastructure saved our business a lot time and money, and we knew we had made the right call.

2017 – Where we are today

So, that was roughly 4 years ago. From that single, humble server we’ve grown a lot. Here’s how things look now.We currently leverage about 22 AWS Technologies. I don’t have enough time today to go into each one, but I’d like to share a few highlights.We run our development, staging and production environments in the cloud using some of these technologies:

Compute / EC2

Half of our product is still driven by EC2 instances. We need multiple now because we have much more product complexity and about 100x the traffic. The instances are based in different availability zones for added redundancy. The traffic is split between them using the Elastic Load Balancer service.

Database / RDS

Our data is our biggest asset. We now run our primary databases in RDS in a master-slave configuration. RDS handles a lot of the housekeeping a DBA’s or sysadmins would typically have to do. We’re a big fan of RDS because aside from the initial migration, we’ve basically spent zero hours on it. It just works.

Static Content / CDN / S3 and Cloudfront

We use S3 for nearly all of our static content. CSS, JS, fonts, images are all stored here. This includes our virtual 360 tours and VR assets.

We add over 15,000 images a month to S3 from property listings alone. We leverage cloudfront, AWS’s CDN on all of our S3 buckets to put content closer to our users and cut down load times. So although our infrastructure is based in Singapore, our Philippines users will load about 95% of the website from an edge node in Manila.

Lead Lifecycle / Event Bus

Each inquiry that comes in on ZipMatch is subject to many various business rules.

Where leads are assigned. SMS and anti-spam checking. Quality scoring and routing etc.

As this system grew in complexity we moved it away from monolithic PHP code, and into a more robust event-driven system. The business rules and logic are python code running in dockers containers, and the systems uses SQS as an event bus and Dynamo for stateful storage.

We’ve had fun a bit of fun load testing it with a few thousand inquiries per minute. It handled it with ease, which means we all sleep better at night than we used to.

Data Science

Data is our edge at ZipMatch. We use it to understand our customers behaviour. We use it to spot trends in the market. We use it to avoid fraud, and bogus information. We use data to dive into the statistics of each province, city and barangay. We are a data driven company through and through.Our analytics and data visualization is done through AWS’s data warehouse product, Redshift. We run ETL processes nightly which dump and transform data and upload it to redshift. We also do periodic extracts from other 3rd party data tools (i.e. mixpanel) and upload it into redshift for analysis / correlation with other data.Redshift allows us to centralize all of our data in one place, and run giant queries we wouldn’t dare in production.

Our data visualization tool of choice is a piece of software called Spotfire. In Spotfire we can build live, real time dashboards for whoever needs them in the business. They are very powerful, and the end result is that most questions we have can answered right away with a pre-built dashboard.

It was actually through our Data analytics setup that we spotted a great market opportunity. Brokers often complain about a lack of data in the market. Government agencies do not publish many real estate statistics, and many records are on paper, making true market information difficult. We realized with our millions of data points from years of listing and project data, we could a lot of great information about Philippines real estate that would be a first in our industry.

2017 – Rankings and Statistics

In March of this year we released a new section of our site dedicated to ranking all the projects in the Philippines by various criteria. Our rankings section allows you to sort various projects or locations by investment data such as rental yield and per SQ/M prices. Lifestyle data like the proximity to points of interest such as schools, shops, and hospitals. Affordability data, looking for the best per SQ/M deals in the country. This was not easy. Our rankings section provides a live summary of information from over 100,000 property listings and 2000 projects. That’s over a million data points to crunch. The vision was to provide a dashboard into the Philippines real estate industry and rank properties by the factors our buyers and sellers value.

We delivered this by building a whole new web front-end and data-API. We run these services in docker containers, which is managed by an ECS cluster. We cache data in redis which relieves pressure from our backend databases.

We send traffic to the rankings section using the new Application Load Balancer service. ALB allows us to do layer-7 traffic routing, sending specific HTTP requests to the new web front end.

We did a soft release in March this year. So far, the industry has been very excited about the data we’ve released through our rankings section. Buyers, renters and even developers themselves have shown a huge interest in the statistics, and data from our web analytic tools shows very high engagement so far.

All in all, it was an ambitious project and I’m confident in saying we could not have spotted the opportunity and delivered it so quickly without the tools AWS provides.

What’s next for ZipMatch?

Ultimately, our goal is to connect buyers and sellers. It sounds simple enough, but all the difficulty lies in the middle. The buying decision is so large, and transaction so complicated that the average renter or buyer is often thinking about dozens of factors in their purchase. Is it within budget? What’s the commute time like? Is it close to schools for kids? What sort of common areas does a project have, etc.We have already begun down the path of leveraging data to help customers make better decisions, how can we ease this further?Perhaps smarter recommendations, using machine learning to constantly improve the algorithms which match buyers and sellers?

Or AI frameworks which helps understand customer needs, and automate repetitive processes?

AWS has existing technologies for both AI and Machine learning. So if and when we choose to leverage these technologies we would certainly look to AWS solutions first.

Running our business on the cloud with AWS has been a cornerstone of our product and IT strategy. It is a big part of why we’re here today. We’ve only scratched the surface of the many opportunities in our industry and we’re excited for what the future has in store.

5 Home Projects with my Raspberry Pi

Raspberry Pi B Plus

Growing up in Canada, I always had a computer around the the house to tinker with. It started with a Macintosh Plus, followed by 486( DX2!). In my adult years, I usually had one tower I would keep alive just to tinker with. A computer to use as a home-server of sorts without the risk of nuking my main machine.

When I moved to the Philippines lugging a big tower over was out of the question for a few reasons.

1-  Space. A 60 square meter (650 square foot) Condo doesn’t leave a lot of room to run a bulky PC tower and monitor.

2 – Power. Power in Manila is at least 3x more expensive than my native city of Vancouver. Running a PC all day just for hobby projects could mean another $750 to $1000 a year in power costs. No thanks.

3 – Cost. I wasn’t all that keen to build another tower, or pay to ship my existing ones over.

So needless to say when I heard about the raspberry pi a few years back, it piqued my interest. Its aimed at hobbyists. The mainboard is about the size of credit card, it runs on a low-power ARM CPU, runs a variant of Debian and costs about $35. Not bad!

I bought my first pi about 2 years ago, just as they announced the Pi 2 Model B. I’ve since used it for a number of fairly practical home projects that I’ll detail below. True Pi hackers be warned, these are pretty normal use cases. No home automation or spotify Jukeboxes here I’m afraid. Just boring sys-admin stuff. 😉

Project 1 – Netflix / Geo-unblocking

 

Around the time I got my Pi, netflix was still unavailable in the Philippines, so getting it into my living room required a bit of monkey work. Although netflix may be global now, there are services which allow you to pick any region you like, so you can swap to US Netflix content, or UK, Canada, etc. Pretty cool.

I take my Canadian content very seriously.

There are a number of services out there that can get around the geo-blocking. I wound up using a provider called Getflix. Getting getflix to work off the bat is pretty simple. Go go the website, and sign up. “Confirm” your current IP, and ensure the device you want to use has one of the Getflix DNS servers set. Simple enough.

However, there were two drawbacks to this sort of setup. One, devices would need to constantly switch DNS servers when they were coming / going from the house. Not exactly something I felt like doing each time I wanted to watch Netflix.

Second, with a home DSL connection, my IP would change and Getflix would stop working. The Getflix DNS servers are secured via IP. So each time my IP changed I needed to log into the Getflix website and update my IP. This wasn’t a major hassle (they actually provided a tokenized link to do it quickly), but still, the little things irritate me.

My goal was to automate all of this behaviour, so I could open netflix on any device on my home network without additional steps. Automating these two things turned out to be pretty simple.

First, I installed and configured BIND on my pi, using Google’s DNS as forwarders. Tutorial here.

Second, in /etc/bind/named.conf.options I added specific zones for netflix, HBOnow and any other services I wanted to use via Getflix like so:

zone "netflix.com" IN {
    type forward;
    forwarders {
        54.251.190.247;
       106.186.22.8;
    };
};

zone "hbonow.com" IN {
    type forward;
    forwarders {
        54.251.190.247;
        106.186.22.8;
    };
};

This effectively forwarded all lookups for netflix.com zones to Getflix instead of Google’s DNS. The end result? Any device on my wifi requesting netflix.com (or any of its subdomains) was pushed through Getflix to do its geo-unblocking magic. Not more static settings.

Overall, I do find using BIND as a home DNS server massive overkill, but DNSMasq and other simple DNS servers didn’t have the capability to forward lookups for a specific zone. So, there we are. ISP-grade DNS software running on my Pi so I can watch Netflix.

The next step is to update Getflix with my ever changing IP Address. Luckily this is pretty simple, Getflix has an API you can push to so a single line in my crontab could do the trick:

5  0    * * *   root    curl -u PUT_YOUR_API_KEY_HERE:x -X GET https://www.getflix.com.au/api/v1/addresses.json >/dev/null 2>&1

That’s it. Now whenever my IP changes, Getflix knows about it within 5 minutes. Seamless setup.

Project 2 – Fileserver

 

One thing I always loved about having a “home server” of sorts, was a central place to backup files and store large media. I had been manually doing this via a 2TB USB drive for the years prior to owning a pi. Turning this 2TB disk into a network fileserver was a simple process.

First I had to install NTFS-3G. Sadly my drive was formatted this way and I didn’t have any other place to dump 1.5TB of media so installing an NTFS driver was a necessity.

Next, I installed and configured Samba. SMB isn’t my favorite protocol, but it is by far the most common. I setup a permanent mount point in fstab for my USB disk, and configured samba to share it under a password protected account. Tutorial here.

Note: I found that the USB drive wouldn’t mount on boot sometimes, a common problem with USB drives on Pi’s. Adding rootdelay=5 to /boot/cmdline.txt gave it enough time for the drive to wake up and mount correctly.

All in all, probably only 20 minutes of work. The result? A fileserver accessible from any device on my home network.

SMB Pi Fileserver
Look at all that storage-ie goodness.

Recently, the app “FileExplorer” has made this setup especially useful as I can stream videos or open docs from my Fileserver to my iphone. Same with the VLC + Infuse on my apple TV. Now, all around the house classic Simpsons and Seinfeld episodes are just a few taps away.

Streaming Video from Pi
I thought you said “go to bread”.

Project 3 – Torrent Server

 

Something I always really envied in the fancy QNAP NAS systems was the built in torrent server. Since I had a central place for media and a heft amount of disk space, running a torrent client on the Pi made sense.

My torrent client of choice Transmission has a linux version with web interface version that runs on Raspbian. Check out the Tutorial here.

Nothing here was too complicated, but I did need to apply a minor patch for some issues with the some of the interface. Not sure if this code made it into the mainline yet, but here it is just in case you run into the same issue: https://trac.transmissionbt.com/ticket/4987

So now, adding a torrent to Transmission is as simple as logging into the web interface and uploading a .torrent file, or even better: pasting a magnet link.

Transmission Web Client

This was all fine and dandy, but I could only start downloads from home. What if I wanted to start a big download remotely, so it would be ready when I got back?

Project 4 – VPN, Dynamic DNS

 

What good is a torrent-enabled fileserver that only downloads if you are at home?

The Raspberry Pi makes a great little VPN server using LT2P over IPSec. This setup is a little more complicated than others, and involves changing some pretty important system files, so you may want to take a full backup before embarking on this project.

A recent, up to date guide to the setup can be found here.  Note the issue with OpenSwan mentioned at the bottom. This had me stuck for a few days. Downgrading your Openswan version seems to be the only workaround for now.

All that’s needed from this point on is setting up the VPN client of your phone and laptop, and we’re in business. Now starting a torrent, grabbing a document from a fileserver or logging into my router are all easy things to do.

Watching myself VPN in via LTE. Pictured, all the crazy IPSEC forwarding, NAT’ing etc I couldn’t hope to ever understand.
Queuing up torrents from my phone via VPN, because why not?

Assuming you are on a home connection without a static IP, the last step to tie this all together will be some form of dynamic DNS. My router supported this with DYN but required a whole bunch of suspicious sounding linksys-cloud login stuff. My Dynamic DNS Provider NO-IP wasn’t compatible anyways, so thats leaves us to configuring it on the Raspberry Pi.

If you aren’t afraid of some good old fashion compiling, setup and installation is pretty simple here.

The readme for the NOIP client suggests copying one of the relevant startup scripts (in this case debian.noip2.sh) to /etc/init.d but I couldn’t get the client to boot on startup. I wound up writing my own here:

#! /bin/sh

### BEGIN INIT INFO
# Provides:          noip2
# Required-Start:    $syslog
# Required-Stop:     $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: noip.com client service
### END INIT INFO

# . /lib/lsb/init-functions
case "$1" in
    start)
        echo "Starting noip2."
        /usr/local/bin/noip2
    ;;
    stop)
        echo "Shutting down noip2."
        killall noip2
        #killproc /usr/local/bin/noip2
    ;;
    *)
        echo "Usage: $0 {start|stop}"
        exit 1
esac

exit 0

As usual, make sure the script is executable and run “update-rc.d noip2 defaults” to build all the symlinks needed for launching.

Project 5 – Smokeping

 

I could write an entire blog post about how poor the internet connectivity is in the Philippines and the various reasons why. While the bandwidth may have improved in recent years ($80 USD gets me a whopping 14mbps) – Latency and packet loss can still be an issue, especially when traffic needs to head outside the Philippines. (many websites, online games etc.)

So, to keep tabs on my connections performance I use smokeping. It’s a piece of software which is definitely showing its age (written in perl, runs out of CGI), but the data it collects and the way it displays it is immensely helpful.

Smokeping can use ping, or any other service you configure to run tests against various network devices. Its a great tool for monitoring any sort of internet / WAN connection because it graphs the latency, packet loss and jitter / variance of these metrics in one. The deviation of ping responses are graphed as “smoke”, whereas the averages are the mid line, and the color represents packet loss.

Smokeping DNS

A very useful tool for tracking and troubleshooting connections all in all.

Now, my past experience with setting up smokeping was not fun. There are few packages, and compiling all the various legacy perl libraries and getting them to play nice is a classic exercise in linux-library dependency hell. I was pleasantly surprised to learn however that there are smokeping packages on the Raspbian repository’s, so smokepings setup is pretty simple on a raspberry Pi.

Run “apt-get install smokeping” to install the service.

Then, edit your Targets file, located at /etc/smokeping/config.d/Targets

I configured mine like so. I have pings for my local router, google DNS, and google itself. I also run a regular DNS lookup to Google’s DNS to measure resolution times.

*** Targets ***

probe = FPing

menu = Top
title = KY-PI Smokeping Network Latency Monitor
remark = Welcome to the SmokePing website of  KMW. \
         Here you will learn all about the latency of our network.

+ Local

menu = Local
title = Local Network

++ LocalMachine

menu = Local Machine
title = This host
host = localhost
#alerts = someloss
+ Routers

menu = Routers
title = Routers for local LAN

++ Router1

menu = 5E-Router
title = Main Router
host = 10.10.1.1


+ Internet

menu = Internet Services
title = Internet Accessbile Services

++ GoogleDNS
menu = GoogleDNS
title = Internet Latency to Google DNS
host = 8.8.8.8

++ Google
menu = Google.com
title = Latency to Google.com
host = www.google.com

++ DynamicDNS
menu = DynDNS
probe = DNS
title = Dynamic DNS Lookup
host = google.com
lookup = google.com
server = 8.8.8.8

To add a DNS probe, simple add the following to/etc/smokeping/config.d/Probes :

+ DNS

binary = /usr/bin/dig

Future Project – RetroPi

I’m a big emulator fan, and after reading about the RetroPi project and the recent release of the Pi Model 3, this looks like a great future project. I’ve got a new Raspberry Pi 3 on order, and an xbox 360 controller with USB receiver ready to go. Looking forward to classic SNES and PSX games soon.

I hear the Kodi media suite is pretty cool as well, and fits nicely with a Pi running RetroPi so adding that to the Pi 3 may provider some bonus HTPC benefits.

Conclusion

 

Running a (variant of) popular linux distro on an ARM chip gives you the feel for what these processors can and will do in the future.

The Pi has really made me appreciate how far the ARM architecture has come. We all run these processors every day in our phones, tablets and other mobile devices, but the software that powers these devices is so integrated and tied to the mobile ecosystem, its hard to appreciate whats under the hood.

The Pi makes a pretty great home server, and with a distribution based on debian its pretty easy to rig up existing tools to get things working. When I look back at all the various projects I’ve tinkered with, it may be the highest value electronics purchase I’ve ever made. Even with a fast SD card, a case and power adapter, you can put a Raspberry Pi together between $50 – $75. So if you’re a sys-admin type like myself, there is still plenty of uses for these babies. My recommendation is to pick one up and start hacking!

HTTP/2 is Awesome. Why aren’t more people talking about it?

If you are reading this, you probably use HTTP nearly every day. It is the basis of the of the world-wide-web, and backbone of most applications we use over the internet. This website, your facebook app and even smart TV’s rely on HTTP to transfer data to and fro.

For such an important protocol, HTTP is pretty long in the tooth in terms of internet technology. Its last major update was HTTP 1.1 which rolled out in 1999. That makes it ancient by web standards. Remember in 1999: the US internet penetration was at 36%, AOL CD’s were still arriving in the mail each month and IE 4 and Netscape Navigator were squaring off for dominance in the great “browser wars”. Things have changed a lot since then.

Thankfully, a major update to the protocol began rolling out to web servers and browsers in early 2015. HTTP/2.

What is HTTP/2?

At first glance, HTTP/2 doesn’t look a whole lot different than HTTP/1.1. In terms of working with the protocol itself, engineers won’t need to change a thing to take advantage of its performance benefits. (phewf!) The request methods are all the same (GET, PUT, DELETE etc) HTTP2 focuses purely on the performance of these requests moving between client and server, and this is where stuff gets pretty cool.

There are far more detailed posts about how HTTP2 differs from HTTP1.1, but from my reading on the subject these are what I find to be the top enhancements:

1.) Multiplexed streams – HTTP 1.1 can only send and request and send data in a single “stream”. This means for every image, css file or font a browser requests, a new TCP connection is established and new request made. These requests would remain open, to waiting for a response. If a browser asked for more than one thing (which it does about 99.99% of the time), multiple TCP connections would be opened, each with its own HTTP stream, each one blocking overall progress as it waits for its resources. To add insult to injury, browsers set a limits on how many connections they will make to a single domain, so developers used workarounds like domain sharding (multiple CDN subdomains) to request more resources at a time. Believe it or not, in the HTTP/1.1 world a best practice is to load 16 images, sharded over 4 cdn domains, with 16 separate TCP connections. This is all very wasteful.HTTP/2 improves this behavior dramatically by multiplexing the streams. This means data can be requested, sent and transmitted all over a single TCP connection simultaneously. The reduction in overhead alone of all the TCP setup and teardown is a big advantage, on its own, but the ability to request and receive HTTP instructions without going into a “WAIT” state is massive, and arguably the biggest benefit of the new protocol.

2.) Server push: The current nature of HTTP is that each and every request goes through a roundtrip pattern like so:

1 – open a new connection.
2 – request the item from the webserver.
3 – receive the data
4 – close connection.

The problem with this pattern is that one resource often spawns requests for others. By the time the browser has loaded the base HTML, step 4 has been completed and it needs to open new connections up for images, css, jss etc, which often in turn spawn MORE requests for more assets. Again, a fairly wasteful process with a lot of back and forth between client + server.Server push allows webservers to pre-emptively send content down the wire, before the browsers even request it. This makes a lot of sense. Instead of the 1 – 4 step process per item, multiple assets and files will be sent back at step 3, reducing the need to constantly run through the full cycle.

Server push is one of the coolest HTTP/2 features, but its rollout seems to have been a bit slower than the HTTP/2 spec itself. (I assume this is due to some major architectural changes certain webservers will need to undertake.) Rollout is pretty mixed today. Nginx only supports server push as part of its paid “Nginx Plus” product. Cloudfront lacks support for Server push, whilst Cloudfare announced support earlier this year. So, YMMV depending on your CDN / Webserver of choice.

HTTP2 vs HTTP1

3.) Header Compression – HTTP/2 uses HPACK to compress all HTTP headers. This is one of those “well that makes sense” sort of improvements. HTTP sends a lot of redundant headers back and forth, often with only minor changes. In HTTP/1.1 these headers were not compressed. In HTTP/2 HPACK compresses all HTTP Headers and enables both client and server to keep a list of past headers for fast retrieval. This drops a decent sized chunk of data from each HTTP webservers have to deal with, leaving more bandwidth available for the important stuff, and resulting in fast load times and less bandwidth consumption.

4.) Binary – HTTP/2 converts requests and responses into binary before sending it over the wire. Binary protocols are more space efficient, more resilient to loss and have the added benefit of being far more secure. Packet-sniffing a binary protocol is no easy task. So again, data moves slightly faster and bandwidth is reduced.

5.) Baked in, fast TLS: Although the official HTTP/2 spec did not make HTTPS + TLS mandatory for HTTP/2, every major browser implementation did. (Smart move from Google, Mozilla etc. The browser giants want to move us to an all-HTTPS world-wide ASAP.) So with that, HTTP/2 is effectively HTTPS only and uses TLS 1.2.

Now, standard wisdom is that SSL / TLS make sites a little slower due the CPU cycles needed for the encryption and decryption of data, as well as adding even more handshakes into an already wasteful process. But with HTTP/2 this isn’t the case.

HTTP/2 leverages a TLS extension called “ALPN” which reduces roundtrips caused by protocol negotiations using techniques similar to the ones listed above. When a browser sets up a TLS transmission, it lists supported protocols in its first “hello” transmission. When the server responds back to the initial “hello” it does so with the selected protocol, which the browser switches to immediately. Less roundtrips, faster TLS setup.

Additionally, the benefits of multiplexing and server push naturally reduce the handshake chatter and CPU overhead becoming noticeable. Most smartphones nowadays (even on the low end) have processing power to spare compared to their relatively slow data speeds.

So what does this mean for the web, mobile apps and other systems which use HTTP?

HTTP/2 is a major “under-the-hood” improvement which should greatly increase the speed of many content-heavy sites, while also reducing the bandwidth consumed by loading such sites. The TLS dependency may slow adoption in the short-term, but serve as a powerful incentive for webmasters and businesses to switch their entire operations to HTTPS.

As per usual, it’s no magic bullet. Sites which have an excessive number of objects to load on each page or have very large payloads will always be slow (although HTTP/2 would help incrementally)

Web Engineers really don’t need to do much to support HTTP/2 aside from getting their sites ready for HTTPS and ensuring they have supported webservers and CDN’s. Certain time-honored performance techniques like CSS + JS concatenation will probably not be as impactful given the multiplex and server-push technologies HTTP/2 introduces.

Sadly, the rollout will take some time. I was disappointed to see this wordpress site is still being served via HTTP 1.1. Godaddy WordPress hosting seems to be using a fairly old version of Apache Traffic Server (5.3, which does not have HTTP/2 support. HTTP/2 is fully supported with Server Push as of 7.0). C’mon Godaddy, upgrade yer shit!

I’m especially eager to see how HTTP/2 improves performance in developing countries, where mobile data is far more common than broadband, and latencies are often in the 100 – 500 ms ranges. Assuming sites are already optimized, HTTP/2 could make a HUGE impact on these sort of connections where latency and poor radio connectivity increase the delays caused by HTTP overhead.

So to conclude, if you run a website or web application and HTTP/2 isn’t in your roadmap, take a closer look. Most websites will be rewarded received a moderate to substantial boost in performance from the migration, which for most modern web stacks could be completed over a weekend.

The proof is in the pudding: Check out some cool demo’s of HTTP/1.1 vs HTTP/2 performance from here and here.

Do you have questions about HTTP/2, or experience migrating your web app to it? Ask or comment below.