Tag Archives: logs

The first law of the Time Lords | Aussie Storage Blog

A buddy of mine posted this article and it reminded me of the presentation I did for the Melbourne VMUG back in April of this year.

The first law of the Time Lords | Aussie Storage Blog:

If you have ever worked in support (or had the need to check on events in general as an administrator) you know how important it is to have an accurate timestamp. Incorrect clock settings are a nightmare if you want to correlate event that are logged on different times and dates.

When you look at the hyper scale of virtualised environments you will see that the vertical stack of IO options is almost 10 fold. Lets have a look from top to bottom where you can set the clock.

  1. Application
  2. VM, the virtual machine
  3. Hypervisor
  4. network switches
  5. The 1st tier storage platform (NAS/iSCSI)
  6. A set of FC switches
  7. The second tier storage platform
  8. Another set of FC switches
  9. The virtualised storage array
Which in the end might look a bit like this. (Pardon my drawing skills)
As you can imagine it’s hard enough to start figuring out where an error has occurred but when all of these stacks have different time settings it’s virtually impossible to dissect the initial cause.
So what do you set on each of these? That brings us to the question of “What is time”. A while ago I watched a video of a presentation by Jordan Sissel (who is working full-time on a open-source project called LogStash). One of his slides outlines the differences in timestamps.:
So besides the different time-formats you encounter in the different layers of the infrastructure imagine what it is like to first get all these back into a human readable format and then aligned across the entire stack. 
While we’re not always in a position to modify the time/date format we can make sure that at least the time setting is correct. In order to do that make sure you use NTP and also set the correct timezone. This way the clocks in the different layers of the stack across the entire infrastructure say aligned and correct. 
You will help yourself and your support organisation a great deal.
Thanks,
Erwin van Londen

The insanity of sanity. (or am I getting insane?)

What would you say if you were having the following discussion? 

“Help, I have a problem.”
OK, so what is the problem? 
Something doesn’t work.
Sorry, what doesn’t work? 
I can’t tell you, it’s classified.
Can you send me the logs? 
Yes but I have to sanitise them.
Uhhmmm, so this means you’re sending me incomplete logs? 
Yes, I have to remove all references to system names, IP addresses, WWN’s, connection diagrams and everything else that might in the smallest way lead to identification of a system or process.
So basically you can only send me information that has events in them?
Uhmm, yes.
But these cannot tell me anything
Uhmm, yes
So how am I supposed to help you?
By fixing my problem.

And this discussion goes round and round.

I can understand that some information is classified but to sanitise up to a level where even the slightest form of information is yanked through the “sed -r ‘s/”anything which might represent an issue”/”XXXX”/g’ serial editor will most certain elongate any form of a proper analysis and your problem will not be fixed.

Try and determine to which extend you need to sanitise your system dumps and make sure information which is needed to do proper analysis stays in those logs.

Thanks
Erwin