Tag Archives: linux

Troubleshooting Linux Storage – My first book

When I asked a buddy of mine, who is a fairly prolific author, what his advice is when I asked him I wanted to write a book, he said “Don’t”. During the process I sometimes wished I’d taken that advice, but as it progressed and neared completion I did feel a real sense of satisfaction. Now, I’m not claiming to be an author and my style of writing is more along the same way I write my blog articles. Just down to earth without any fuss and simply trying to be a clear and concise as possible.

It took a bit longer than firstly anticipated as my previous employer had some issues with me writing such a book as well as the fact that I got diagnosed with some nasty disease I had to conquer, but here we are. I could finally press the “Publish” button.

So what is it about. As the subject says, Troubleshooting Linux Storage.

In my career as a support engineer, I’ve seen many issues popping up in a variety of circumstances at customers’ sites, ranging from very small to very large multinationals. A common factor has been that in many occasions there was confusion of what was actually happening and where any of the problems originated from. Now, I’m far from claiming I’m an expert at any layer of the Linux IO stack, but as I’ve been doing both storage and Linux for a fairly long time, I have a pretty good understanding of where to look when things go wrong, how to identify them and how to resolve them.

In the book, I’ve tried to capture a lot of what I know, and I hope it will help system administrators in diagnosing problems, resolving them and based on these experiences, prevent this from happening again.

Is it a complete bible of everything that can go wrong? I think there would not be enough trees in the world to provide the paper to print it on, nor would you be able to lift the book physically. Even just a Kindle version would seriously be stretching the storage capacity of the device. As always, you have to make decisions on what is useful to write and the necessity to refer to other sources. Most of the things in the book are of a practical nature around the troubleshooting art. It does contain a fair amount of links to other sources where needed.

As this is my first attempt of ever doing such a thing, I did not really want to go via one of the large publishing houses like o’Reilly or Starch Press. Maybe in the future that changes. That also means that from a publishing perspective this has been a one-man job, and you could encounter some irregularities that I may not have captured. When I do these will be corrected asap.

The book can be purchased via Amazon.

It is also now available in digital format via Leanpub

https://leanpub.com/troubleshootinglinuxstorage

I welcome any feedback, good or bad, and appreciate suggestions, so I can improve the book in future versions and help more Linux system administrators.

Kind regards

Erwin

Using systemd-resolved to optimise DNS resolution.

When you work from home and are required to use the corporate network you’re often shoved into a dilemma where the VPN configuration that is pushed to your PC results in one of two modes, Full-tunnel or Split-Tunnel.

Digging tunnels

A full-tunnel configuration is by far the most dreadful especially when your VPN access-point is on the other side of the planet. Basically all traffic to and from your system is pushed through that tunnel. This is even the case when a web-page is hosted next-door from where you are sitting. Your requests to that webserver will first traverse via your VPN connection to the other side of the planet where your companies proxies will retrieve the page via the public web only to send it back to you via that VPN again. Obviously the round-trip and other delays will basically result in abominable performance and a user-experience that is excruciatingly painful.

A split-tunnel however is far more friendly. As I explained in one of my previous articles (here) only traffic destined for systems inside your corporate network will be routed over the VPN and requests to other systems will just traverse the public interweb. 

Domain Name resolution

There is however one exception DNS i.e. the name to (IP) number translation. Traditionally Linux uses a system-wide resolver that looks in “/etc/resolve.conf” what you DNS servers are and which domains to search for plus a few other options. That basically means that as soon as you have any VPN tunnel active you would always need to use your corporate DNS servers for any request as your system does not really know which server is located where. There may even be a situation that your corporate DNS servers point to a different host for the same domain. You often see this where employees get additional functionality than external users or credential verification may be bypassed as you already have an authorised session to the internal systems.

The drawback is however that sites outside your corporate network are also resolved via your companies DNS servers. This may not only be a limitation on performance from a resolver standpoint, remember that these DNS requests also have to traverse the same VPN tunnel, but the resulting system to where you end up may also not be the most appropriate one.

As an example.

If you have an active VPN to your corp.com, even DNS queries for a web-site in your country will first go “Corp DNS” who, if it does not already have a cached address itself, will forward that request to whatever “Corp DNS” has configured as its upstream DNS server. (In this case Google). As you can see you could’ve asked Googles DNS servers yourselves but as you VPN session has set your resolver to use the Corp DNS that does not happen. An additional point of attention is that you have to be aware of is that no matter which website you visit your company will have a record of that as most corporate regulations stipulate that actions done on their systems will be logged for whatever purpose they deem necessary. This may sometime conflict with different privacy policies in different countries but that is most often shuffled under the carpet and hidden in legal obscurity.

The above also means that when you have requests for sites that span geographies, you may not always get to the most optimal system. Many DNS system are able to determine where the request is coming from and subsequently provide a IP address of a system that is closest to the requestor. As your request is fulfilled by your companies’ DNS server on the other side of the planet, that web-server may also be there. Not to panic as many of these environments have build in smarts to re-direct you to a more local system it nevertheless means this situation is far from optimal. What you’re basically after is to have the ability to, in addition to that split-tunnel configuration, direct DNS queries to DNS servers which actually host the domains behind that VPN and nothing else.

In the above case your Linux system has two interfaces. One physical (WIFI or Ethernet) and one virtual (VPN most often called tunX where X is the VPN interface number)

Meet systemd-resolved

There are some Linux (or Unix) purists who shudder at the sight of systemd based services but I think most of them are actually pretty OK. Resolved is one of them.

What resolved allows you to do is assign specific DNS configurations to different interfaces in addition to generic global options.

As an example

Global
LLMNR setting: yes
MulticastDNS setting: yes
DNSOverTLS setting: no
DNSSEC setting: allow-downgrade
DNSSEC supported: no
Fallback DNS Servers: 9.9.9.9
DNSSEC NTA: 10.in-addr.arpa
16.172.in-addr.arpa
168.192.in-addr.arpa
<snip>
31.172.in-addr.arpa
corp
d.f.ip6.arpa
home
internal
intranet
lan
local
private
test

Link 22 (tun0)
Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
DefaultRoute setting: yes
LLMNR setting: yes
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: no
DNSSEC supported: no
Current DNS Server: 10.15.230.6
DNS Servers: 10.15.230.6
10.15.230.7
DNS Domain: corp.com
internal.corpnew.com

Link 3 (wlp0s20f0u13)
Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
DefaultRoute setting: yes
LLMNR setting: yes
MulticastDNS setting: no
DNSOverTLS setting: no
DNSSEC setting: yes
DNSSEC supported: yes
Current DNS Server: 192.168.1.1
DNS Servers: 192.168.1.1
DNS Domain: ~.
ourfamily.int

As you can see it has three sections. The global section caters for many default settings which can be superseded by per-interface settings. I think overview speaks for itself. All requests to domain “corp.com” and “internal.corpnew.com” will be sent to one of the two DNS servers with the 10.15.230.[6-7] adress. All my home internal requests as defined by the “ourfamily.int” domain are sent to the 192.168.1.1 address. The “~.” means all other requests.

That will result in queries being returned like:

[1729][erwin@monster:~]$ resolvectl query zzz.com
zzz.com: 10.xx.16.9 -- link: tun0
10.xx.16.8 -- link: tun0
172.xx.24.164 -- link: tun0
172.xx.24.162 -- link: tun0
10.xx.100.4 -- link: tun0
10.xx.148.66 -- link: tun0
10.xx.7.221 -- link: tun0
10.xx.7.34 -- link: tun0
10.xx.7.33 -- link: tun0
10.xx.100.5 -- link: tun0

-- Information acquired via protocol DNS in 243.1ms.
-- Data is authenticated: no

If I would use an external DNS system for that domain it would return different addresses.

[1733][erwin@monster:~]$ dig @9.9.9.9 +short zzz.com
169.xx.75.34

(The above are not my real domains I queried but I think you get the drift)

Queries to non-corporate websites will be retrieved via the WIFI interface (wlp0s20f0u13)

[1733][erwin@monster:~]$ resolvectl query google.com
google.com: 2404:6800:4006:809::200e -- link: wlp0s20f0u13
216.58.203.110 -- link: wlp0s20f0u13

-- Information acquired via protocol DNS in 121.0ms.
-- Data is authenticated: no

As my home router has a somewhat more sophisticated setup this also allows me to have all external DNS requests, not destined to corp.com or corpnew.com, use a DNSoverHTTPS or DNSoverTLS configuration to bypass any ISP mangling.

Setup

Systemd-resolved is a systemd service (duhh) which needs to be enabled first with “systemctl enable systemd-resolved“. The configuration files are located in /etc/systemd/resolved.conf or in a .d subdirectory of that where individual configuration files can be stored.

-rw-r--r-- 1 root root 784 Oct 20 14:32 resolved.conf
drwxr-xr-x 2 root root 4096 Oct 20 14:24 resolved.conf.d/

The settings can also be applied interactively via the “resolvectl” command which I have done. If your distro has NetworkManager installed then NM can also automatically configure resolved via D-bus calls.

There is more involved than I can easily simplify here as it would pretty quickly become a re-wording of the man-page which I try to avoid. At least I hope it has given you some information of what you can do with “systemd-resolved

Kind regards,

Erwin

Getting rid of whitespace

No, not storage related but more towards coding scripts etc and assuring your git repositories do not show up with huge diff sections you need to correct. Just a little tip and a “note to self”.

If you’ve event been keen enough to not use an IDE for whatever language you use and kept to a real editor (VIM obviously.. :-)) you may have encountered the phenomenon that whitespace at the end of lines is a nasty thing to look at when you start putting stuff into version control repositories like Subversion or GIT. A little change from some copy or past action may leave you with a “git diff” of a couple of hundred lines you need to correct.

To fix that simply let VIM clear out all empty whitespace (tabs, spaces, etc.) by having these removed before the actual write to disk.

To do that simply add

autocmd BufWritePre *.sh :%s/\s\+$//e

to your ~/.vimrc and with every :w the substitute function driven by the regex after the colon will remove it all in all shell scripts (*.sh). Obviously you can add every extension you need here.  Very handy.

Cheers,

Erwin

Performance misconceptions on storage networks

The piece of spinning Fe3O4 (ie rust) is by far the slowest piece of equipment in the IO stack. Heck, they didn’t invent SSD and Flash for nothing, right. To overcome the terrible latency, involved when a host system requests a block of data, there are numerous layers of software and hardware that try to reduce the impact of physical disk related drag.

One of the most important is using cache. Whether that is CPU L2/L3 cache, DRAM cache or some hardware buffering device in the host system or even huge caches in the storage subsystems. All these can, can will, be used by numerous layers of the IO stack as each cache-hit means it prevents fetching data from a disk. (As in intro into this post you might read one I’ve written over here which explains what happens where when a IO request reaches a disk.)

Continue reading

Open Source Storage (part 2)

Six years ago I wrote this article : Open Source Storage in which I described that storage will become “Software Defined”. Basically I already predicted SDN before the acronym was even invented.  What I did not see coming is that Oracle would buy SUN and by doing that basically killing off the entire “Open Source” part of that article but hey, at least you can call yourself a Americas Cup sponsor and Larry Elisons yacht maintainer. 🙂

Fast forwarding 6 years and we land in 2015 we see the software defined storage landscape has expanded massively. Not only is there a huge amount of different solutions available now but the majority of them have evolved into a mature storage platform with almost infinite scalability towards capacity and performance.

Continue reading

Getting rid of browser cache polution

I always try to keep a clean slate on my Linux box. Not only when it comes to security but one thing I really hate is the massive amount of stuff that gets downloaded as “chemical waste by-products” like a gazilion revisions of twitter and other (anti-)social media icons, pictures, style-sheets etc. etc. etc.

Obviously you can set limitations to the size of the cache but to prevent this pollution from being retained across reboots and clogging up the inode tables with useless entries.

old_hard_drive_products_pdc6c

Continue reading

Why Docker is the new VMware

5 Years ago wrote this article:

Server virtualisation is the result of software development incompetence

Yes, it has given me some grief given the fact 99% of respondents did not read beyond the title and made false assumptions saying I accused these developers for being stupid. Tough luck. You should have read the entire article.

Anyway, in that article I did outline that by using virtualization with the methodology of isolating entire operating systems in a container is a massive waste of resources and the virtualisation engine should have focused on applications and/or business functionality. It took a while for someone to actually jump into this area but finally a new tool has come to life which does exactly that.

Continue reading

Blurry Fonts on X

This is not really a blog post more a note to self. After pulling my hair out because after an update of the X-server and Gnome on my Fedora 20 box every font accross windows, window titles, app-screens more or less became unreadable blurry and fuzzy characters all over the place that were only readable with a plus 9 left and minus 9 right glasses. I had this in the past and most of the time by upgrading the latest NVidia driver the problem was resolved. Not so today. Reboots, kernel updates, NVidia driver updates, xorg.conf modifications and adjustments all led to no avail. I was at the brink of throwing the box out of the window when I read a onliner on askubuntu.com which said

“OK, so this is going to sound ridiculous. But I switched the screen itself off and on again and it is working now.”

This looked so incredibly unbelievable that I ran out into the street, wanted to start screaming but could hold myself… (barely), went back in, turned my monitor off, then back on and the problem had disappeared. Aaaarrrrghh…..

OK, coffee now… Have a nice day..

Cheers,

Erwin

Why partition alignment on disk matters (Linux)

Linux has been pretty good with and for storage. The sheer volume of options w.r.t. filesystems, volume-managers, access methods (FC, iSCSI, NFS, DAS etc), multi-pathing  but also the very broad support of the hardware ecosystem is something to be proud of. The issue with storage support is that you ALWAYS have to maintain a massive backward-compatibility string with previous generations of technology. Not only from a hardware perspective but also the soft-side needs to retain the older technology. I saw a video featuring Linus, Greg Kroah-Hartman,  Sarah Sharp and Ted Ts’o over here where Ted mentioned that the KVM feature helped him massively with regression testing for the storage projects he’s involved in. (As you may know Ted maintains the ext(2/3/4) filesystem among other things). That brings me to the bottleneck of history in a technology environment and why the topic I described in the subject is important.

Continue reading