Tagged with linux

How to fix the Percona repo failure when installing Percona Toolkit

Here’s a solution to the not-so-long-standing issue of the Percona yum repo being broken for the CentOS 6 x86_64 version of the Percona-toolkit package. The repo listing is reporting an older version of the RPM which is not available on the site, so to fix this you just have to download the newer file and tell yum to add it locally. The side benefit is that you can use Yum to manage the RPM without adding the Percona repo, since the default settings for their repo could/have/had caused conflicts with Base Repo versions of MySQL packages; the Percona repo instructions set ‘enabled=1′ — not a great idea if you’re not setup to use the Yum priorities method of repo weighting.

So, if you see this after installing the repo via the instructions on their site:
Downloading Packages:
http://repo.percona.com/centos/6/os/x86_64/percona-toolkit-2.1.9-1.noarch.rpm: [Errno 14] PYCURL ERROR 22 – “The requested URL returned error: 404″
Trying other mirror.

Error Downloading Packages:
percona-toolkit-2.1.9-1.noarch: failure: percona-toolkit-2.1.9-1.noarch.rpm from percona: [Errno 256] No more mirrors to try.

The solution is as follows and is only two simple commands:
wget http://repo.percona.com/centos/6/os/x86_64/percona-toolkit-2.2.1-1.noarch.rpm
yum install ./percona-toolkit-2.2.1-1.noarch.rpm

Or you could even be so bold as to combine it via a seamless call without downloading locally, but I didn’t bother.

Tagged , , , , ,

Cloudflare, now offering to be your Single Point of Failure

There have been many articles about the downtime issue with Cloudflare last week, so I won’t get into the technical details of that. However, there’s the fine print to remember. Consider this a subtle reminder that core Internet infrastructure services like Cloudflare’s DNS-based “Always Online” caching and packet inspection security services do not come with Service Level Agreements even at the “Pro” account level. Even with a Pro account you are paying for a service with no uptime guarantee and you must only hope that it resolves your sites the majority of the time. This is fine, this is what the contract says: no SLA unless you pay for the Business account. An odd naming convention given that most Professionals are using their websites for business and would want the SLA, but I digress.

So, the SLA is not really the issue if you look at the architectural alternatives to building an architecture that desires availability when your primary and secondary DNS servers potentially going offline. The typical design involves using more than one and certainly more than two DNS servers for your domain so that your domain addresses will still resolve if the primary and even the second go offline. Typically these servers will be on separate subnets and even in separate geographical regions so that events like tsunamis and dataceneter fires do not take out both your primary and secondary name servers; so there are options for a third and fourth resolver; but not with Cloudflare.

Cloudflare limits the user to only using their DNS servers for your domain – of which they only provide two resolvers, not three or four like most DNS services. So if you wish to have a third or fourth name server entry to ensure that even if the primary and secondary Cloudflare DNS servers go offline, well sorry, you cannot do so. Cloudflare will disable your domain in their system if you use any DNS entries that are not their own – which includes a third or forth setting. So now you have your “Professional” websites using a DNS and security service that has no SLA but which you are paying for “Professional” level services. If your “Professional” grade sites go offline because Cloudflare botched the router upgrade or was hacked, you’re SOL and you do not get downtime credits, sorry. You can’t even design your architecture to resolve with alternate name servers or they will disable your domain. So if Cloudflare ever goes offline your sites will go offline with them and there are no alternatives. If you use Cloudflare then their service becomes your Single Point of Failure.

I am not one to create drama but this is an issue that none of the other users of Cloudflare “Pro” account users that I’ve talked to were aware of. So, here is a recent email exchange with Cloudflare regarding a credit for having caused all of my sites to be offline on more than one occasion — this is not limited to the recent event with their routers.

Cloudflare: “I’ve reviewed your account and note that you currently have 3 Pro subscriptions with us. At this time we do not offer a guaranteed level of service or SLA for our Free or Pro plans… We are also investing a great deal of time and resources to ensure the resiliency of the network even in the event of localized failures that may happen from time to time.” — So not only is there no SLA but there could be localized failures from time to time that you also do not get credit for; that explains the monitoring failures for some of my sites that are in the same rack and some even running on the same servers but only the Cloudflare enabled domains are shown as being offline at the same time as the others being available.

My response to Cloudflare. “I will not keep Cloudflare running for my sites. There are many reasons but another one has recently made itself know to me when I decided to add tertiary and quaternary DNS servers, yet I run into the following technical limitation that precludes my ability to rely on Cloudflare for my back-end infrastructure domain: if I want to specify a 3rd or 4th DNS server as a backup resolver (like Route53 or my own servers running PowerDNS for example) then the Cloudflare system complains and disables my domain. I understand why the system is designed this way — you want all traffic going through the CF system so that the features are executed and so forth. However, in the event that an issue occurs like the previous outage then there is no fall back for users/systems to resolve my domain via an alternate DNS system. I am limited to two Cloudflare DNS servers and nothing more.

The way that the DNS requirement is setup makes Cloudflare an all or nothing solution – you either use Cloudflare for the domain or you do not. And, as 785,000+ sites experienced, this makes Cloudflare (no matter how resilient and improved after this incident) a single point of failure that system engineers and architects cannot design failover services around.

This is the second time that I have had issues with Cloudflare services not working correctly. The first was when one of my servers went offline and the “Always On” feature didn’t do anything, the site was not kept online via cache even though there had been plenty of time for the crawlers to get the content (which is static unchanging, non dynamic, non-database driven = simply a front page that is supposed to load fast and act as a click portal to our primary systems).

And now I have been seeing users connect to my site from countries that I have setup in the block list. I have a number of ‘trouble’ countries configured in Cloudflare to disallow access to my site yet these users are connecting anyway. Clearly the country blocking feature is broken as well.

I want to use Cloudflare. I want to love it. I want to tell everyone I know how great and useful it it. But after six months of using it on several sites it has done nothing more than cause me a lot of time trying out different configurations and wondering why feature x/y/z isn’t working as stated. Then there have been the outages from human error and incorrect ITIL process adherence.

So I will be setting up some alternate caching servers at different datacenters and moving some of my content onto a CDN. Cloudflare has failed and I am tired of wanting to like it.”

Tagged , , , ,

Building a MySQL Private Cloud: Step 1

Building clusters is usually a fun time. Here’s one of my setups at the Equinix LAX1 facility that is being used for VPN services, OpenVZ clustering, and general RADIUS and MySQL clustering integration. Once the clustering design is finalized, it’s still in flux state while I try out different setups, I’ll post some physical+logical architecture diagrams to show “How to Build a Fault Tolerant Infrastructure for Virtualized MySQL NDB Cluster + Python-based VPN systems.” Stay tuned for more.

LAX1-rack-front

Tagged , , , , ,

OpenVZ and Amazon S3: how to solve the dreaded connection throttle failure

Sometimes we encounter odd application responses that seem to make no sense. One of these such issues is related to running virtual server instances (OS Containers not Para-Virtualized VMs) and attempting to back up their data to Amazon’s S3 cloud storage. For moderately sized virtual machines running MySQL databases or Python/PHP based websites and code repositories this can be an inexpensive, quickly provisioned, and easy way to provide disaster recovery backups in numerous geographic locations, since we generally want DR content to be located in a physically distant location. Nevertheless, we can encounter errors if using an S3 mount in a distance location from our server if the timezone/sync data is incorrect.

The commonly seen error is as follows – and it doesn’t give much information for troubleshooting and resolution.

WARNING: Upload failed:  ([Errno 32] Broken pipe)
WARNING: Retrying on lower speed (throttle=0.00)
WARNING: Waiting 3 sec...

The solution is seemingly unrelated to any network related or file-system settings on the virtual machine or the host server. It has to do with running S3 storage buckets in different time zones than your server and not having the system sync’d to NTP pools. So, the solution for Redhat/CentOS/Fedora/Scientific (for other Linuxes just replace the package management commands as needed):

First we have to enable the ability for the OpenVZ container to utilize NTP. Add the following line to your /etc/vz/conf/101.conf file (where 101 in this example is the ID of your own container, which you can find via the command “vzlist”).

CAPABILITY=" SYS_TIME:on"

Then restart the container(s) to get the setting to take and login to the container. You can either SSH or enter the container from the main host.

$ vzctl restart 101
$ vzctl enter 101

On the VM itself, install ntpdate package to be able to sync time data.

$ sudo yum install ntpdate

Sample ntp.conf file for NTP pool servers on CentOS 6.3. There are plenty of other configuration settings but these are the basics. This file goes on the VM server, not the host server.

$ sudo cat /etc/ntp.conf
driftfile /var/lib/ntp/drift
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
restrict 127.0.0.1 
restrict -6 ::1
server 0.centos.pool.ntp.org
server 1.centos.pool.ntp.org
server 2.centos.pool.ntp.org
includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys

Restart the ntpdate service on the VM to sync to the pool.

$ sudo service ntpdate restart
ntpdate: Synchronizing with time server:                   [  OK  ]

Add a cron job to the VM (either in /etc/crontab or via “crontab -e”) for automatic ability to sync the time every day.

# sync date/time with ntp pool
05 01 * * *	root /usr/sbin/ntpdate 2>&1 | /usr/bin/tee -a /var/log/messages

Now you can run S3 backups with throttling errors. Done and done. No more errors.

Tagged , , , , ,

The InnoDB Quick Reference Guide is now available

I’m pleased to announce that my first book, the InnoDB Quick Reference Guide, is now available from Packt Publishing and you can download it by clicking here. It covers the most common topics of InnoDB usage in the enterprise, including: general overview of its use and benefits, detailed explanation of seventeen static variables and seven dynamic variables, load testing methodology, maintenance and monitoring, as well as troubleshooting and useful analytics for the engine. The current version of MySQL ships with InnoDB as the default table engine, so whether you program your MySQL enabled applications with PHP, Python, Perl or otherwise, you’ll likely benefit from this concise but comprehensive reference guide for InnoDB databases.

Here are the chapter overviews for reference:

  1. Getting Started with InnoDB: a quick overview of core terminology and initial setup of the testing environment.
  2. Basic Configuration Parameters: learn about the most common settings and prerequisites for performance tuning.
  3. Advanced Configuration Parameters: covers advanced settings that can make or break a high-perfomance installation of InnoDB.
  4. Load Testing InnoDB for Performance: learn all about general purpose InnoDB load testing as well as common methods for simulating production workloads.
  5. Maintenance and Monitoring: covers the important sections of InnoDB to monitor, tools to use, and processes that adhere to industry best practices.
  6. Troubleshooting InnoDB: learn all about identifying and solving common production issues that may arise.
  7. References and Links: informative data for further reading.
Tagged , , , , , , , ,

Bash scripting: ElasticSearch and Kibana init.d scripts

As a follow up to the previous post about logstash, here are a couple of related init scripts for anyone implementing the OpenSource Log Analytics setup that is explained over at divisionbyzero. These have been tested on CentOS 6.3 and are based on generic RC functions from Redhat so they will work with Redhat, CentOS, Fedora, Scientific Linux, etc.

Tagged , , , , ,

Bash scripting: an improved init.d service for LogStash

LogStash is a great program, but I’m not going to get into that topic right now. So simply put, if you are aware of LogStash being great and are using it but feel like you need an init script to run it as a service then here’s the most recent and best (according to my personal functionality testing of all of the existing logstash init scripts available online) init script available: download the improved init script from my repo.

Note: this was originally written by Michael Ladd but was lacking some functionality to run on CentOS 6.3; so this is an improved version of that script.

Tagged , , ,

Quick How-To for DRBD + MySQL + LVS

I wrote this up a while ago and decided that I didn’t want to lose it in a shuffle of documents during my transition to a new workstation. It’s the basics of setting up Heartbeat (LVS) + DRBD (block replication between active/passive master servers) + MySQL. This should give you the basics of a H/A system without the benefits of SAN but also without the associated cost. The validity of this setup for H/A purposes is highly dependent on your workload and environment. You should know the ins and outs of your H/A solution before deciding to blame the system for not performing as expected. As with all production systems you should test, test, test and test some more before going live.

When I get around to it later I’ll post my How-To for setting up RHCS + SAN + MySQL. You can download the DRBD document PDF here: DRBD_LVS_Install-Configure_HowTo

Tagged , , , ,