06 Feb 2012
Planet Grep
FOSDEM organizers: Thank you, volunteers!
FOSDEM would like to thank all volunteers who helped make our conference possible again. The bussload of students that helped with the setup, the numerous volunteers that reacted to the call for volunteers, the people who spontaneously showed up at the infodesk offering their services. The regular veterans, the new blood. You all did a splendid job and I sincerely hope to see you all again next year.
Thanks, guys. Couldn't have done it without you!
06 Feb 2012 11:40am GMT
05 Feb 2012
Planet Grep
Frederic Descamps: Fosdem 2012 - Pictures
I already uploaded the pictures of Fosdem and especially the MySQL & Friends devroom:
Thank you to all visitors and speakers !
I hope you enjoyed it an see you next year !
05 Feb 2012 10:30pm GMT
FOSDEM organizers: Error on printed schedules
We have discovered that the printed schedules end about an hour before the conference is scheduled to end. We are still trying to decide whether the schedule is too long or the paper too short.
Note that at 17:00 in Janson, Bdale Garbee will present his Freedom, Out of the Box! keynote. This will be followed immediately by the closing talk and FOSDEM dance.
Be there ... or be elsewhere!
05 Feb 2012 12:27pm GMT
FOSDEM organizers: Video feedback?
FOSDEM is streaming video from a select number of rooms this year (see the URLs if you want to watch).
Watching the stream from home? Love it? Hate it? Feedback is much appreciated!
You can join us through IRC: #fosdem on Freenode, or (with slightly higher latency) use hashtag #fosdemvideo on twitter.
Thanks!
05 Feb 2012 10:24am GMT
Steven Wittens: A Useful BitTorrent Analogy
A Useful BitTorrent Analogy
The first successful commercial photo copier, the Xerox 914.
BitTorrent has been around for over a decade now. And yet, when mentioned in the media, it's pretty much universally associated with piracy and illegal file sharing.
Just the other day, I saw a journalist write proudly: "No, I don't have a Torrent program and I'm not downloading one." A journalist! Someone who is supposed to be an expert at retrieving information and sharing it!
BitTorrent is not scary, and more so it actually generates the majority of traffic on the internet. In the 21st century it should be a tool that sits on your digital utility belt, not something you wouldn't touch with a 10 foot pole. So here's a simple analogy to help understand it.
· • ·
Imagine a budget-starved teacher needs to hand out notes for class, but can only afford one copy. The document is 10 pages long, and there are 10 students who each need a full set.
The teacher could just give the notes to one student, and ask him to make all the copies, but that would only shift the burden, leaving him to pay for all 100 pages.
Instead, the teacher has an idea. She hands page 1 to student #1, page 2 to student #2, and so on, and tells each student to make 10 copies of their single page. The next week, the students can distribute them amongst themselves before class, and everyone has a full set. Nobody has to pay for more than their own 10 pages.
Everyone's happy: the teacher gets to share her knowledge cheaply, and the students don't mind paying for their own copies.
In the middle of the term, a new student joins. She could borrow someone else's big pile of notes, and copy the entire stack of paper, but that would mean she would have to pay for it all, and she's on a budget too.
So instead, she just goes around and asks each student to make a single copy of the pages they were assigned previously. The next week, she collects all the copies, and gets a full set without even bothering the teacher.
She gets a free pass to get up to speed, but the other students don't mind chipping in. That's because she immediately joins the game and can make copies too. The teacher can now hand out one page extra each week, or decide to give one student a free pass. If more students join, it works better and better.
Now instead, imagine that students join and leave the class every single day, and the teacher isn't quite so organized. She just puts her big stack of notes on the desk, and tells everyone they can take any page they want, as long as they promise to immediately make copies for anyone who asks. The students are all friendly, and make sure to keep each other in the loop about which pages everyone has. Both the originals and the copies are copied as many times as needed.
· • ·
That's BitTorrent in a nutshell. For any given class-i.e. a file that people are interested in-a cloud of students forms-i.e. the peers in the so called peer-to-peer network. The peers compare notes, see which pieces they are missing, and swap copies with each other. Eventually, the teacher (a.k.a. the seeder) can leave, taking her original copy with her, and the system will keep working. As long as there is at least one copy of every page in the room, the students can make more, and the full set lives on.
This is pretty much the only way you can effectively distribute a massive archive of sensitive data to thousands or millions of people, without incurring massive bills. You can't use free or ad-supported services, as the material would get taken down instantly due to its sensitive nature. And you can't host it directly, as that would leave a trail pointing back to you.
With BitTorrent, your initial group of 'students' can be sworn to secrecy. After the initial round of copying, the teacher sneaks out, and the students just pin a notice on the bulletin board: "We have copies of The Forbidden Secrets by Dr. X. Come see us." Nobody claims to know who Dr. X is. Ideas and information flow freely, without censorship.
05 Feb 2012 8:00am GMT
04 Feb 2012
Planet Grep
Wim Coekaerts: Changing database repositories in Oracle VM 3
At home I have a small atom-based server that was running Oracle VM Manager 3, installed using simple installation. Simple installation is the option where you just enter a password and the Oracle VM Manager installer installs : Oracle XE database, WebLogic Server and the Oracle VM Manager container. The same password is used for the database user, Oracle VM Manager database schema user, weblogic user and admin user for the manager instance.
The manager instance stores its data as objects inside the database. To do that, there is something called a datasource defined in weblogic during installation. It's basically a jdbc connection from weblogic to the database. This DS requires the following information : database hostname, database instance name, database listener port number, schema username and schema password. In my default install this was localhost, XE, 1521, ovs, mypassword.
Now that I re-organized my machines a bit, I have a larger server that runs a normal database 11.2.0.3, which I also happen to use for EM12c. So I figured I would take some load off the little atom server, keep it running Oracle VM Manager but shut down XE and move the schema over to my dedicated database host. This is a straightforward process so I just wanted to list the steps.
1) shut down Oracle VM Manager so that it does not continue updating the repository. as root : /etc/init.d/ovmm stop 2) export the schema user using the exp command for Oracle XE as oracle : cd /u01/app/oracle/product/11.2.0/xe export ORACLE_HOME=`pwd` export ORACLE_SID=XE export PATH=$ORACLE_HOME/bin:$PATH exp (enter user ovs and its password) export user (option 2) export everything including data this will create (by default) a file called expdat.dmp copy this file over to the other server with the other database The schema name is also in /u01/app/oracle/ovm-manager-3/.config (OVSSCHEMA) 3) shutdown oracle-xe as it's no longer needed as root : /etc/init.d/oracle-xe stop 4) import the ovs user into the new database. I like to do it as the user. I just simply pre-create the schema before starting import as oracle : sqlplus '/ as sysdba' create user ovs identified by MyPassword; grant connect,resource to ovs; at this point, run the imp utility on the box to import the expdat.dmp import asks for username/password, enter ovs and its password import yes on all data and tables and content. At this point you have a good complete repository. Now let's make the Oracle VM Manager weblogic instance point to the new database. 5) on the original system, restart weblogic as root :/etc/init.d/ovmm start wait a few minutes for the instance to come online 6) use the ovm_admin tool as oracle : cd /u01/app/oracle/ovm-manager-3/bin ./ovm_admin --modifyds orcl wopr8 1521 ovs mypassword My new host name for the 11.2.0.3 database is called wopr, the database instance is orcl and listener is still 1521 with schema ovs The admin tool asks for a password, this is the weblogic user password. In a simple install, this would be the same as your admin or ovs account password. 7) restart to have everything take effect. as root : /etc/init.d/ovmm stop ; sleep 5 ;/etc/init.d/ovmm start ; 8) edit the config file and update the new data vi /u01/app/oracle/ovm-manager-3/.config modify : DBHOST= SID= LSNR= OVSSCHEMA= and leave the rest as is. that should do it !
04 Feb 2012 7:36pm GMT
Peter Van Eynde: IPv6 versus IPv4 at fosdem :S

how so?
pevaneyn-mac:wireshark pevaneyn$ traceroute v4.fr.ipv6-test.com
traceroute to v4.fr.ipv6-test.com (46.105.61.149), 64 hops max, 52 byte packets
1 193.191.79.254 (193.191.79.254) 6.215 ms 0.282 ms 0.244 ms
2 ge.ar1.brucam.belnet.net (193.191.4.49) 0.350 ms 0.325 ms 0.365 ms
3 10ge.cr2.bruvil.belnet.net (193.191.16.189) 1.143 ms 0.964 ms 0.994 ms
4 ovh.bnix.net (194.53.172.70) 2.396 ms 1.900 ms 1.942 ms
5 rbx-g2-a9.fr.eu (94.23.122.137) 5.712 ms 4.725 ms 4.794 ms
6 rbx-2-6k.fr.eu (91.121.131.9) 10.489 ms 15.149 ms
rbx-1-6k.fr.eu (91.121.131.13) 50.591 ms
7 rbx-26-m1.fr.eu (213.251.191.201) 4.448 ms
rbx-26-m1.routers.ovh.net (213.251.191.73) 4.754 ms 4.996 ms
8 eight.t0x.net (46.105.61.149) 3.950 ms 3.975 ms 4.067 ms
pevaneyn-mac:wireshark pevaneyn$ traceroute6 v6.fr.ipv6-test.com
traceroute6 to v6.fr.ipv6-test.com (2001:41d0:1:d87c::7e57:1) from 2001:6a8:1100:beef:114f:fb76:XXXX:XXXX, 64 hops max, 12 byte packets
1 2001:6a8:1100:beef::1 0.558 ms 0.674 ms 0.507 ms
2 2001:6a8:1000:800f::1 0.370 ms 0.414 ms 0.393 ms
3 10ge.cr2.bruvil.belnet.net 1.106 ms 1.112 ms 1.034 ms
4 ae0-200.bru20.ip6.tinet.net 1.620 ms 1.572 ms 1.523 ms
5 xe-2-1-0.ams20.ip6.tinet.net 6.063 ms
xe-5-2-0.ams20.ip6.tinet.net 5.999 ms
xe-8-1-0.ams20.ip6.tinet.net 6.002 ms
6 * * *
7 * * *
8 * * *
9 fra-5-6k.de.eu 25.602 ms * 30.531 ms
10 rbx-g2-a9.fr.eu 31.890 ms 27.448 ms 26.656 ms
11 rbx-1-6k.fr.eu 29.996 ms
rbx-2-6k.fr.eu 33.715 ms
rbx-1-6k.fr.eu 26.735 ms
12 2001:41d0:1:d87c::7e57:1 25.498 ms 31.873 ms 30.815 ms
So a trip around Europe. But IPv6 needs not be slow:
pevaneyn-mac:fosdem pevaneyn$ traceroute6 www.debian.org
traceroute6: Warning: www.debian.org has multiple addresses; using 2001:858:2:2:214:22ff:fe0d:7717
traceroute6 to www.debian.org (2001:858:2:2:214:22ff:fe0d:7717) from 2001:6a8:1100:beef:114f:fb76:XXXX:XXXX, 64 hops max, 12 byte packets
1 2001:6a8:1100:beef::1 0.640 ms 1.731 ms 0.607 ms
2 2001:6a8:1000:800f::1 0.491 ms 0.356 ms 0.387 ms
3 2001:6a8:1000:2::2 0.442 ms
10ge.cr2.bruvil.belnet.net 1.081 ms 0.989 ms
4 10ge.cr1.brueve.belnet.net 1.979 ms
10ge.cr1.brueve.belnet.net 1.718 ms 1.479 ms
5 20gigabitethernet1-3.core1.ams1.ipv6.he.net 4.766 ms 8.460 ms 7.190 ms
6 10gigabitethernet1-1.core1.fra1.he.net 16.977 ms 20.783 ms 11.835 ms
7 ge2-19-decix-ipv6-c1.ix.sil.at 70.823 ms 42.928 ms 45.012 ms
8 2001:858:66:203:215:2cff:fe8d:bc00 27.416 ms 26.934 ms 28.561 ms
9 ip6-te1-4-c2.oe3.sil.at 26.776 ms 26.413 ms 26.856 ms
10 2001:858:66:22c:217:fff:fed4:6000 27.156 ms 27.472 ms 26.778 ms
11 englund.debian.org 27.211 ms 27.641 ms 27.823 ms
pevaneyn-mac:fosdem pevaneyn$ traceroute www.debian.org
traceroute: Warning: www.debian.org has multiple addresses; using 86.59.118.148
traceroute to www.debian.org (86.59.118.148), 64 hops max, 52 byte packets
1 193.191.79.254 (193.191.79.254) 0.619 ms 0.254 ms 0.255 ms
2 ge.ar1.brucam.belnet.net (193.191.4.49) 0.432 ms 0.385 ms 0.448 ms
3 10ge.cr1.brueve.belnet.net (193.191.16.205) 1.153 ms 1.557 ms 0.951 ms
4 nl-asd-dc2-ias-csg01.nl.kpn.net (195.69.144.144) 5.608 ms 5.442 ms 10.251 ms
5 * * *
6 ffm-s1-rou-1021.de.eurorings.net (134.222.229.10) 38.019 ms 37.926 ms
ffm-s1-rou-1021.de.eurorings.net (134.222.231.250) 39.953 ms
7 ffm-s1-rou-1022.de.eurorings.net (134.222.228.86) 40.075 ms
ffm-s1-rou-1022.de.eurorings.net (134.222.228.90) 38.180 ms
ffm-s1-rou-1022.de.eurorings.net (134.222.228.86) 42.755 ms
8 mchn-s1-rou-1022.de.eurorings.net (134.222.228.194) 33.019 ms 33.211 ms 37.045 ms
9 wien-s2-rou-1002.at.eurorings.net (134.222.228.46) 39.827 ms 37.795 ms 39.839 ms
10 wien-s2-rou-1041.at.eurorings.net (134.222.123.242) 37.581 ms 37.633 ms 39.505 ms
11 sil.cust.at.eurorings.net (134.222.123.150) 37.654 ms 35.650 ms 35.521 ms
12 englund.debian.org (86.59.118.148) 38.009 ms 38.124 ms 40.628 ms
This entry was originally posted at http://pvaneynd.dreamwidth.org/148844.html. Please comment there using OpenID.
04 Feb 2012 4:13pm GMT
FOSDEM organizers: FOSDEM dance
Unfortunately, due to time constraints we were unable to entertain the crowd with our usual FOSDEM dance.
To make up for this, we have rescheduled it to after the closing talk.
04 Feb 2012 11:14am GMT
Frank Goossens: Fiesta: WP YouTube Lyte reaches 1.0.0
I just released the one dot ohhhh dot ohhhhhhhhhh version of WP YouTube Lyte!
From the changelog:
- new: also works on (manual) excerpts; just add a httpv link to the "excerpt" field on the post/page admin (based on feedback from Ruben@tuttingegneri)
- new: if youtube-url contains "start" or "showinfo" parameters, these are used when playing the actual video. This means that you can now jump to a specific time in the YouTube video or stop the title/ author from being displayed (based on feedback from a.o. Miguel and Josh D)
- update: javascript now initiates either after full page load or after 1 second (whatever comes first), thus avoiding video not showing due to other requests taking too long
- update: bonus feature stops lockerz.com tracking by addtoany (you'll still want to hide the "earn pointz" tab though)
- bugfix: prevent the playing video to be in front of e.g. a dropdown-menu or lightbox (thanks to Matt Whittingham)
- bugfix: solve overlap between player and text when option was set not to show links (reported by Josh D)
And an appropriate vid to go with this new release:
Possibly related twitterless twaddle:
- Embedding HTML5 YouTube video with WP YouTube Lyte
- WP YouTube Lyte 0.9.0: size matters
- The bulleted WP YouTube Lyte bulletin
04 Feb 2012 7:23am GMT
03 Feb 2012
Planet Grep
FOSDEM organizers: Schedule changes
The following are last-minute changes and are not in the booklet or printed schedule:
Saturday
Opening talk:
- only in Janson
Open Mobile Linux devroom:
- new: 'Clouds over computing' by Jens Wiik at 18:00
Virtualization and Cloud devroom:
- cancelled: 'Application scheduling on OpenStack' at 12:00
Sunday
Telephony and Communications devroom:
- talk swap: 'Mobicents, TelScale and RestComm' now at 10:30
- talk swap: 'Enhancing FreePBX with Adhearsion' now at 13:20
Free Java devroom:
- 'OpenJDK on ARM: Quo vadis' not 60 minutes but 30 minutes
Perl devroom:
- new: 'LedgerSMB: Open source accounting running on Perl' at 12:25
Graph Processing devroom:
- replaced: Bio4j talk by 'Birds of a feather - Graph processing, future trends!' at 11:10
03 Feb 2012 8:55pm GMT
Claudio Ramirez: Perl devroom @ FOSDEM2012
Just a short reminder of the Perl talks at FOSDEM2012.
The Perl dev-room will be held this Sunday February 5th, from 9 to 17h on room AW1.121. We have a wide range of talks. Some talks target Perl programmers with subjects ranging from a beginner to an advanced level. Other talks don't focus on the language itself, but rather on projects that use Perl as a building stone.
So please, drop by if you are at FOSDEM…
| Room: AW1.121 | |||
| Sunday 2012-02-05 | |||
| Event | Speaker | Room | When |
|---|---|---|---|
| Welcome to the Perl devroom | Claudio Ramirez | AW1.121 | 09:00-09:05 |
| Moose Primer | Nicholas Perez | AW1.121 | 09:05-09:25 |
| Advanced Moose Techniques | Nicholas Perez | AW1.121 | 09:35-09:55 |
| Perlude: a taste of Haskell in Perl | Marc Chantreux | AW1.121 | 10:05-10:45 |
| Perlito | Flávio Glock | AW1.121 | 11:05-11:45 |
| The LemonLDAP::NG Project | Clément Oudot | AW1.121 | 11:55-12:15 |
| LedgerSMB: Open source accounting running on Perl | Erik Huelsmann | AW1.121 | 12:25-12:45 |
| Modern PerlCommerce | Stefan Hornburg | AW1.121 | 13:25-14:05 |
| Rapid real-world testing using git-deploy | Ævar Arnfjörð Bjarmason | AW1.121 | 14:15-14:35 |
| POSIX::1003 | Mark Overmeer | AW1.121 | 15:00-15:40 |
| The FusionInventory Project | Guillaume Rousse | AW1.121 | 15:50-16:10 |
| Using Moose objects with Memcached | Marius Olsthoorn | AW1.121 | 16:20-16:40 |
Filed under: Uncategorized Tagged: dev-room, fosdem, FOSDEM2012, Perl
![]()
03 Feb 2012 8:29pm GMT
Xavier Mertens: Get The Most of Your Monitoring/Security Tools!
The idea of this article popped in my mind after a colleague of mine asked me to investigate a security incident. Nothing brand new, a customer's server not properly patched and secured was pwned. I found that the server was hit by the JBoss worm which started to spread in October 2010. Then the server started to scan for other victims, etc. Why was the server not patched and why it was able to access Internet directly, I don't know. I won't start a new debate here. I just would like to insist on the ways (read: tools) that can be used to detect such incident at the right time.
When I started my investigations, I had a limited number of data sources: The firewall logs and a network monitoring appliance. No log management solution and the server was turned off "to avoid more problems" (OMG!). The firewall logs gave me of course some relevant information but what about the network monitoring appliance? This is the same kind of appliance that I'm using during the BruCON conference to keep an eye on the visitors traffic. Very nice statistics can be generated. Basically, this appliance performs three tasks:
- Collection of all network flows + statistics (like Netflow)
- IDS (packets are analyzed via a built-in Snort)
- Web categorization
My investigations continued on this appliance and, as you can imagine, I found a multitude of evidences:
- Snort alerts (IRC traffic, id, wget, root alerts)
- Unusual traffic from servers to the Internet
- Suspicious web sites (domains & categories)
By having a look at the information reported by the appliance, the customer could at an early stage (even in real-time!) be alerted of the attack. But those features were simply… not used! The appliance was installed to monitor the network performances, that's it! But it could do much more!
That's an effect of the "Microsoft Syndrome"! What is this? I found a good definition on computerworld.com:
"There are several symptoms. One is when a tech company becomes so successful in a market and grows so quickly that it overlooks potential new markets. Another is when a tech company gets so large that it becomes increasingly difficult for it to innovate."
From my point of view, I would like to extend this definition on the technical aspect of IT products:
"Another symptom is when a software becomes so complex that you only use a few percentage of its features and forgot or don't know how to use the others."
A typical example is Microsoft Word. I'm a Word user but, honestly, I must use 10% of all the features! Sometimes, I'm working on RFP which go very deep in the feature requirements and, finally, most of them will remain unused or unimplemented.
I think it's time to remind the principle of "more with less". Implementing security solutions is very expensive and budgets are often frozen or reduced. If you put some (lot of) bucks into a solution, be sure to use it at 100%! Read the manuals (you know, "RTFM!"), follow trainings, invest some time! Sometimes, cool features could be used for other purposes and increase the ROI! This reflexion goes in the same direction as one of my previous article about implementing security controls using Nagios.
03 Feb 2012 4:33pm GMT
Frank Marien: Extremon Unveiled
Ah, "Monitoring"
It certainly means different things to different people:
As sysadmins, we want to know our systems are ok, to the slightest detail, and if not, what is wrong. Preferably before, or at least while it happens. As (enlightened) developers we want to be able to follow our application's behaviours in production. As service managers we want to know if we're delivering the service as agreed. What's up, down, for how long, how slow, who's to blame. As managers we might want to know the bottom line, of how many downloads, sales,..in a pretty Widget. You can probably think of a few more. And you're probably not happy with that you have (or you wouldn't have monitoringsux), which is most likely different systems runing besides each other, with different paradigms, platform-dependent API's (if any), Different Web GUI with lineair lists and state colors, that you force to refresh ever few minutes, and even then look at old information, and that have credentials to authenticate to the systems they monitor, making them dangerous points of failure in terms of security.
Depending on what you're trying to monitor, you may be OK with all of these.
But if you're like us, you'll end up in a multi-everything (platform, application, networks, silos, sites, policies) environment with no end of interdependencies, where at least some applications are interactive and time-critical, and the sysadmins and developers are collegues, horizontal team members, or all the same people. This is the type of environment that we're growing Extremon for.
Taking 3 important headlines from the Extreme Monitoring Manifesto
Live, with Subsecond temporal resolution
Most of the data you're gathering will be required for different purposes. The service response time and validity you're testing in a functional test tells the sysadmins in the data center that it's fast enough, tells the developer that his caching strategy works in production, tells the service manager that you're ok with the SLA, tells the 1st and 2nd line support *at one glance* that the problem isn't the server, etc.. it makes sense to gather the data only once, which gives you breathing room to gather it more intensely. I propose starting at one probe per second, which is peanuts for most modern systems, but which will give you data points at the highest resolution you'll ever want (you can always average etc.. over longer periods for different uses). I find that services that have issues with one second probes are in deep trouble anyway, and should be rethought. Of course you shouldn't come up with the heaviest possibly data or query set on purpose. But for normal use.. 1/sec.. is really nothing.
Agent Push really is the only option for system metrics, at that speed, and that's fine: Provisioning agents is a near zero cost game given DevOps practice, agent push solves the monitoring security issue in one fell swoop, requiring no connections from the monitoring hosts to the agent, hence, no authentication, no technical users, no flaws to exploit, and no endless login/measure/logout sequences wasting CPU slices and network traffic.
I currently favour collectd because it's fast, light, pluggable, and has a very efficient network protocol. We provision our collectd's using puppet, and have them push their metrics to multiple monitoring hosts on the Internet every second. Yes, you read that right, collectd uses UDP, so we're pushing UDP over the Internet. I hear you cry in horror that "Packets may get lost". Yes they may, and yes they do. But that's OK, the data will come in a few seconds later. It's no big deal. We've chosen to use the signing and encryption options, because we're paranoid and proud of it. We have our collectd's gather all the usual system data, but also application-specific metrics, from applications that support this, and e.g. JVM memory metrics.
The monitoring hosts have collectd instances in listening mode, so they get (most of) the collected data, which gives us the view from *inside* the hosts. Also, the monitoring hosts run any and all kinds of custom service tests, exercising the Internet-published services from the outside. This is the external view: What will the end-user experience. These tests push their results into the same collectd instances, meaning these now have all the relevant metrics.
Hot-pluggable components
As much as we love collectd's efficient binary UDP protocol, we want the simplest possible protocol, and that isn't.
Using a small collectd write plugin, we write whatever collectd gathers to a multicast group, UDP again, in the simplest format we could find: label-value pairs. The protocol is this:
Metrics are grouped in "shuttles". Each shuttle consists of a number of lines, followed by a blank line.
Each line consists of a label, an equals sign, a value, and a carriage return.
a label represents a reverse-fqdn of your internet domain, followed by whatever hierarchical representation you see fit. here's some lines from a shuttle:
be.apsu.prod.eridu.df.var.df_complex.reserved.percentage=5.16165872485
be.apsu.prod.eridu.df.opt.df_complex.reserved.percentage.state=0
be.apsu.prod.eridu.df.opt.df_complex.free.percentage.state.comment=More Than 60% Free Space
be.apsu.prod.eridu.df.home.df_complex.reserved.percentage.state=0
be.apsu.prod.eridu.df.tmp.df_complex.reserved.percentage=5.1617400345
be.apsu.prod.eridu.apsu_be.https.httpprobe.responsetime=127.850000
be.apsu.prod.eridu.apsu_be.http.httpprobe.responsetime=49.020000
The plugin adds a timestamp in ms, so every shuttle has one (not shown above)
Since these are multicast (with a ttl of zero), any process on the same monitoring host can join that multicast group and read all the metrics from all the collectd and custom agents. Here's where filters can clean up the namespace where necessary, contributors can translate values into states and trends, trends into states, states into alerts, and aggregators can contribute calculated values. Contributions just go back into the cauldron. For example, the "percentage" metrics in the example above is contributed by a "df" aggregator which takes reserved, free, and in use metrics and calculates their equivalent percentages."percentage.state" and "percentage.state.comment" are contributed by a "df.state" contributor that decides which percentage values are OK, for which disks.
We call this multicast group "The cauldron", since this is where all the ingredients are added and transformed. The nice thing about the multicast group, is that it's easy to plug into, live, easy to read and write from, by any process, in any language, without interrupting anything else, and we get an extraordinarily robust and proven implementation of it with any GNU/Linux we install.
In the cauldron, any metric (and all it's derived values, such as states, aggregates, etc..) appear each time the metric is received, and all metric appear, for the entire namespace, so the cauldron may "boil" intensely if you add many metrics. For example, the cauldron on each of the 2 monitoring hosts we're working on today, "boils" at about 5000 metrics per second. it only looks intense when you look at it. To the machine, that's only 64Kbyte/sec, even without compression.
To add more hosts, for scaling, we would simply connect them using Ethernet, and set the ttl to 1 instead of 0, to allow the multicast out of the host. But we're far far away from needing that kind of scaling, at this point.
Simple Text-based Internet-Friendly Subscription Push API
One type of process in the cauldron allows multiple TCP connections, reads a simple HTTP URL, consisting of the /-separated namespace, and serves shuttles conforming to that URL on a TCP connection, starting off with a complete set of all the current values, followed by updates. This allows any application to subscribe to the metrics it needs, and update a local cache. (Or not. If you were writing that Widget, you might not even keep any cache, just update the widget as the data evolved) We serve this with an apache webserver in front, to handle security and encryption.
Let's see how idle our CPU's are for these 2 systems (app1 and app2):
$ wget https://<hidden>/*/cpu/*/cpu/idle/value --user..
hidden.app2.cpu.0.cpu.idle.value=89.2023
hidden.app1.cpu.0.cpu.idle.value=88.32
hidden.app2.cpu.1.cpu.idle.value=99.1911
hidden.app2.cpu.0.cpu.idle.value=91.8071
hidden.app2.cpu.1.cpu.idle.value=99.8242
hidden.app1.cpu.0.cpu.idle.value=93.0782
hidden.app1.cpu.1.cpu.idle.value=93.5785
hidden.app2.cpu.1.cpu.idle.value=97.9927
hidden.app1.cpu.0.cpu.idle.value=86.2266
hidden.app1.cpu.1.cpu.idle.value=96.7542
The first 4 lines are the values at connection time, the rest of the lines are updates.. Since we're measuring at 1Hz, and CPU values tend to change all the time, we get updates every second.
Let's look inside an application (we've set up collectd to takes these snmp measurements on the server in question)
$ wget https://<hidden>/app1/snmp/counter --user=..
hidden.app1.snmp.counter.validations.value=1.99592
hidden.app1.snmp.counter.cache_misses.value=0
hidden.app1.snmp.counter.cache_hits.value=2.49491
hidden.app1.snmp.counter.cache_refreshes.value=0
hidden.app1.snmp.counter.validations.value=3.48378
hidden.app1.snmp.counter.cache_hits.value=5.47449
hidden.app1.snmp.counter.validations.value=3.51587
hidden.app1.snmp.counter.cache_hits.value=4.52042
All production disk usage, live:
$ wget https://<hidden>/prod/**/df/**/free/percentage--user=..
etc.. etc..
A python client to this is about 23 lines and uses only standard classes.
Of course we have and we'll maintain a few reference clients in various languages.
Display on a meaningful representation, and in real-time.
Web Pages were intented to convey static documents with links between them. Stretching that metaphor only goes so far, and I don't think web pages are an appropriate medium to convey real-time data (but that's my opinion (fr4nkm), koendc has different ideas, and is working on Javascript-based clients, which, I must say, look pretty impressive) Also, we want a "meaningful representation" which implies that for anything more complex than a single server we want to get away from HTML-driven lists and status colours, and move to a full schematic of the systems we're monitoring, and their connections.
Drawing a full top-level schematic of one's systems is something I have found both extremely useful and relatively rare. Many systems have grown organically with their organisations and have never even thought of drawing such. This makes it very hard for anyone to get a good idea of how the whole functions, and encourages silo-type thinking with everyone just looking at their little part of the world. While there's nothing wrong with drawing a partial diagram, I find the minimum should be the "big picture".
Once you have the "big picture", why not use it to project monitoring data? In that way, you can immediately tell where the data that you're looking at fits into the whole, and, from the states of the connected systems, make deductions about what is going on and what the impact is. This allows for faster triage, and for experts in different domains to gather around the same view of the whole system, and look at their own details, while not loosing sight of the connections.
For example: a web service goes "unusable" in the remote functional test, becomes red and glowy on the display. A sysadmin looks at it (and right clicks it to indicate she's looking into it - see below), zooms in to find the exact measurements, finds that the connection times out. The same functional test, from the inside, that is right next to it, remains OK, responding within a few ms. Triage indicates that this is a network issue, that all the application and backends are fine (and they do show as green), so she gets the network expert to look, he zooms are other parts of the system that are monitored.. If the same service had been merely slow, she would have zoomed in on the warnings in the application server, where she might have found many cache misses, for example, due to some backend problem. If not, she might have asked the developer or application specialist to zoom in on the application metrics.All on the same screen, if required, from anywhere in the world with a reasonable TCP connection, if need be.
The display we're developing uses SVG to display schematics, with the home view being the largest supersystem that we monitor. Here we see systems and their boundaries, and some services and response times of the most important externally offered services, if any. The response times are live, if we see a bar graph shoot outwards and grow red, we know that service is at least slow. If a host goes yellow, it may have a disk space or CPU usage issue, if an application goes red, it may have fatal errors in it's log file, JMX, SNMP or other metrics. The point is that SVG are vector graphics, and that we can have any amount of detail hidden in our larger schematic, and zoom into any part to find that details. Host disk space need not be represented large enough to be readable from the home view, as long and the host state shows a problem, we can zoom on it to make more detail visible, while mentally retaining the link between that host, it's applications, and the whole system. This is a far cry from seeing a red icon next to APPSRVWEB001_VAR_TMP_FREE, and having to mentally make that link.
Also, we can easily give anyone who might want it a read-only view of our systems. It makes a great deal of difference to your customer service experience, if a service desk agent, having the same overview capability, can tell a calling customer, with confidence, where the problem is (not) located, that someone is working on it (and perhaps, who), knowing who to contact, whereas, without that capability, they have to "get back" to the customer, leaving the latter the impression that we're not monitoring, at all.
Implicit Provisioning (Test-driven infrastructure)
When we were provisioning machines manually, it followed that we provisioned monitoring manually, as well. You don't have the info in a machine-readable format, you cannot parse it.I know organisations that have 2-3 FTE just working on provisioning monitoring solutions (with Web interfaces, click, click click all day, doing the same drudge work).
Now that we've moved to automated provisioning, it would make a lot of sense to handle monitoring as much as possible from that same angle. Ideally, what we want is for monitoring to set itself up, from the same machine-readable descriptor that will set up the actual infrastructure to monitor, before that happens. We call this "test-driven infrastructure" just as it is called test-driven-development in the XP methodology: You write a test (but the information is already largely in your puppet or other description), monitoring starts, the infrastructure and all it needs to support appears in the namespace and on the screens, with all the states in ALERT because nothing is working, obviously. Then, as the VM's, OS, and services appear, states go to OK. At the end, you know your system is OK, because all is green, just as with TDD, you know your code is OK, because all your tests are green.
We haven't done much on this side of the equation. Much of the extremon configs are still in hard-coded object graphs (designed to be instantiated from textual descriptors, but this is not yet implemented), and so there is a lot of manual provisioning, in there, still.
Graphing
We don't have anything of our own, and we don't want to reinvent the wheel. We've connected a carbon engine to the cauldron, and it happily keeps track of and graphs our 5K metrics/sec. (but we had to do some tuning). Ideally, we should get our display code to display those graphs.
Schematic Overview
The Code So Far
I'm consolidating our 4 private github repos to create new, public ones, by or at the #monitoringsux hackathon.
Done: https://github.com/m4rienf/ExtreMon-Display
Done: https://github.com/m4rienf/ExtreMon
ToDo: Koen's Javascript clients, Java browser namespace browser applet
03 Feb 2012 3:22pm GMT
Guy Van Sanden: Why does the upgrade-manager in precise insist on removing skype?
After upgrading to Precise, I noticed that Skype was uninstalled. But it was easily fixed by downloading the deb from Skype's site.
But now, at each update via-update manager, it says the skype package should have been removed and I need to remove it before proceeding?
Is this a bug? Any workaround?
03 Feb 2012 9:39am GMT
FOSDEM organizers: Friday build-up
FOSDEM is almost upon us.
We will begin building up the ULB campus on Friday at 13:00. If you are around and want to help out, do join us!
Most work could be finished by 18:00, if you are hesitating to join in the late-afternoon, check this post whether that's still needed.
03 Feb 2012 12:09am GMT
02 Feb 2012
Planet Grep
Frank Goossens: Is Lana del Rey een Meat Puppet?
De Meat Puppets schreven het, maar Nirvana stal er de show mee:
En Lana Del Rey, da's ook een vleespop, luister maar;
Dat horen van vage gelijkenissen is misschien een kleine afwijking, maar … seriously Lana?
Possibly related twitterless twaddle:
- Good God, die nieuwe Anouk swingt een eind weg!
- Vive La Fête: teveel aan het elixir gezeten?
- Ook funky muziek is wiskunde
02 Feb 2012 5:21pm GMT



