05 Feb 2012
Planet filibeto
Blog O' Matty: Which file system should I use with Gluster?
I was reading through the Gluster 3.2.5 release notes today and came across the following blurb: Red Hat recommends XFS when formatting the disk sub-system. XFS supports metadata journaling, which facilitates quicker crash recovery. The XFS file system can also be de-fragmented and enlarged while mounted and active. Any other POSIX compliant disk file system, [...]
05 Feb 2012 8:53pm GMT
03 Feb 2012
Planet filibeto
Darryl Gove: Using prtpicl to get cache sizes
If you are on a SPARC system you can get cache size information using the command fpversion, which is provided with Studio:
$ fpversion A SPARC-based CPU is available. Kernel says main memory's clock rate is 1012.0 MHz. Sun-4 floating-point controller version 0 found. An UltraSPARC chip is available. Use "-xtarget=sparc64vii -xcache=64/64/2:5120/256/10" code-generation option.
The cache parameters are output exactly as you would want to pass them into the compiler - for each cache it describes the size in KB, the line size in bytes, and the associativity.
fpversion doesn't exist on x86 systems. The next best thing is to use prtpicl to output system configuration information, and inspect that output for cache size. Here's the cache output for the same SPARC system using prtpicl.
$ prtpicl -v |grep cache
:l1-icache-size 0x10000
:l1-icache-line-size 0x40
:l1-icache-associativity 0x2
:l1-dcache-size 0x10000
:l1-dcache-line-size 0x40
:l1-dcache-associativity 0x2
:l2-cache-size 0x500000
:l2-cache-line-size 0x100
:l2-cache-associativity 0xa
03 Feb 2012 6:45pm GMT
Darren Moffat: What Free/Open Source software is Solaris 11 still missing
Note this is not a commitment from Oracle to deliver anything as a result of your answers, nor is it an official survey of any kind.
Okay first my dirty little secret... my family home desktop machine runs Windows 7. Earlier this week I had a need to check the MD5 or SHA256 checksum on an iso image I'd downloaded. On Solaris I'd just run 'digest -a sha256' or sha256sum on Solaris or any Linux distro. But on Windows 7 the best I could come up with was code it up in Java myself or install the GNU versions via Cygwin.
So that got me thinking, the Solaris 11 repository has a lot more "upstream" Free/Open Source tools and frameworks than any other release of Solaris ever had. We have Python (which is really a core part of Solaris 11 now), Ruby loads of the GNU runtime and development toolchains and much much more. However many common Linux distributions still have more than we do but some of that isn't target at server use cases.
So what Free/Open Source software is Solaris 11 still missing that you use to run your business on your Solaris servers?
Even if you don't have Solaris 11 installed you can quickly search for packags at http://pkg.oracle.com/solaris/release
Please add details in the comments.
Again note this is not a commitment from Oracle to deliver anything as a result of your answers, nor is it an official survey of any kind, just my curiosity. I will of course log the relevant bugs for viable things if any come up.
03 Feb 2012 4:05pm GMT
02 Feb 2012
Planet filibeto
Joerg Moellenkamp: Reminder: Oracle Solaris 11 Techdays 2012
Ich möchte nochmal auf die nächste Woche startende Veranstaltungsreihe zum Thema Solaris 11 hinweisen. Es gibt zwar schon viele Anmeldungen aber ich will "die Hütte voll sehen"
. Mehr Informationen sowie eine Agenda dazu findet ihr hier.
02 Feb 2012 8:55pm GMT
Blog O' Matty: ZFS and OS X meet again?
I just came across a reference to ZEVO tonight. This appears to be an add-on package for OS X that is built on top of ZFS. I'm going to have to keep an eye on this. Snapshots, data checksumming, de-dup, compression and zfs send/recv would be pretty cool on my Laptop. :)
02 Feb 2012 2:08am GMT
01 Feb 2012
Planet filibeto
Steve Tunstall: New Power Calculator is up
The Oracle Power Calculator for the new 3TB, 600GB, and 300GB drive versions of the ZFSSA is now up and running.
From this page, you can click on the "Power Calculators" link on top to go back out to the main screen where you will find power calculators for all of Oracle hardware.
01 Feb 2012 2:36am GMT
Steve Tunstall: How to calculate your usable space on a ZFSSA
So let's say you're trying to figure out the best way to setup your storage pools on a ZFSSA. So many choices. You can have a Mirrored pool, a RAIDz1, RAIDz2, or RAIDz3 pool, a simple striped pool, or (if you're REALLY anal) you can even have a Triple Mirrored pool.
How can you choose which pool to make? What if you want more than one pool on your system? How much usable space will you have when it's all done?
All of these questions can be answered with Ryan Mathew's Size Calculator. Ryan made a great calculator a while back that allows one to use the ZFSSA engine to give you back all sorts of pool results. You simply enter how many disk trays you have, what size drives they are, how many pools you want to make, and the calculator does the rest. It even shows you a nice graphical layout of your trays. Now, it's not as easy as a webpage, but it's not too bad, I promise. It's a python script, but don't let that scare you. I never used Python before I got my hands on this calculator, and it was worth loading it up for this. First, you need to go download and install Python 2.6 here: http://www.python.org/getit/releases/2.6/ Make sure you have 2.6 installed, as the calculator will not work with the newer 3.0 Python. In fact, I had both loaded, and had to completely uninstall 3.0 before it would work with my installed 2.6.
Now, get your hands on the Size Calc script. Ryan is making a new one that is for the general public. It will be out soon. In the meantime, ask your local Oracle Storage SC to do a calculation for you.
This is a copy from Ryan's, but I fixed a few things to make it work on my Windows 7 laptop. If you're not using Windows 7, you may find Ryan's original blog and files here: http://blogs.oracle.com/rdm/entry/capacity_sizing_on_7x20
So now you're ready. Go to a command line and get to the Python26 directory, where you have also placed the "size3.py" script.
Type "size3.py ZFSipaddress password 20"
Use your ZFSSA for the IP address and your root password for the password. You can use the simulator for this. Remember, the simulator is the real code and has no idea it's not a 'real' system.
Mine looks like this: "Size3.py 192.168.56.102 changeme 20" Now, you will see the calculator present a single tray with 20 drives, and all the types of pools you can make with that.

So now, make it bigger. Along with the first tray that has 20 drives (because of the Logzillas, right?), we also want to add a 2nd and a 3rd tray, each full with 24 drives. So type "Size3.py 192.168.56.102 changeme 20 24 24" You could do this all day long. Notice that now you have some extra choices, as the NSPF (no single point of failure) pools are now allowed, since you have more than two trays.

That's it for the basics. Pretty simple. Now, we can get more complicated. Say you don't want one big pool, but want to have an active/active cluster with two pools. Type "Size3.py 192.168.56.102 changeme 10/10 12/12 12/12"

This will create two even pools. They don't have to be even. Check this out. I want to make two pools, one with the first 2 disk trays with 8 logzillas plus half of full trays 3 and 4. So the second pool would only be the other half of trays 3 and 4. I used "Size3.py 192.168.56.102 changeme 20/0 20/0 12/12 12/12"

Here's the last one for today- Say you already have a 2-disk shelf system, with 2 pools, and you set it up like this: "Size3.py 192.168.56.102 changeme 10/10 12/12" Simple. Now, you go out and buy another tray of 24 drives, and you want to add 12 drives to each pool. You can use the "add" command to add a tray onto an existing system. It's very possible that adding a tray will give you different results than if you configured 3 trays to begin with, so be careful. This is a good example. Note that you get different results if you do "10/10 12/12 12/12" then if you do "10/10 12/12 add 12/12".

Our next lesson will be about VDEVs. When you add the "-v" command right after "size3.py", you may notice a new column in the output called "VDEVS". These are the most important aspect of your pool. It's very important to understand what these are, how many you need and how many you have.
It's so important, I'm going to save it for another blog topic. Have a great day!!!! J
01 Feb 2012 2:21am GMT
31 Jan 2012
Planet filibeto
Dave Miner: Detroit Solaris 11 Forum, February 8
I'm just posting this quick note to help publicize the Oracle Solaris 11 Technology Forum we're holding in the Detroit area next week. There's still time to register and come get a half-day overview of the great new stuff in Solaris 11. The "special treat" that's not mentioned in the link is that I'll be joining Jeff Victor as a speaker. Looking forward to being back in my home state for a quick visit, and hope I'll see some old friends there!
31 Jan 2012 9:21pm GMT
Blog O' Matty: Reading a file into a Python string
I've learned a number of useful things from the Google learn Python video series. One of the tips I got to use today. That tip was Python's ability to read a file into a string: $ cat foo this is a test file of words $ python >>> f = open("foo","r") >>> string = f.read() [...]
31 Jan 2012 1:44am GMT
29 Jan 2012
Planet filibeto
Blog O' Matty: Getting MySQL running on a CentOS Linux server
I started playing with MySQL back in the 4.X days, but never invested a lot of my time since my day job required me to support Oracle databases. I'm trying to branch out more now, and recently picked up a copy of MySQL, MySQL High Availability and PHP And MySQL. There are a slew of [...]
29 Jan 2012 2:21pm GMT
28 Jan 2012
Planet filibeto
Blog O' Matty: Integrating ssh-agent into your login process
Most of my readers utilize SSH keys to access remote systems. The security benefits are well known, and key-based authentication makes automating remote tasks a whole lot easier. When you use key-based authentication it becomes imperative to protect your private key, since a third party could access your systems if they were able to gain [...]
28 Jan 2012 1:35pm GMT
25 Jan 2012
Planet filibeto
Blog O' Matty: Free video tutorials for C, Java, PHP, HTML5, Python, MySQL and more …
I just came across the new boston video tutorial series. I've watched 20 of the PHP videos and am hooked. The production quality is great, and the content is really, really good! Once I finish the 200 PHP videos I plan to watch their MySQL and HTML5 videos. Can't recommend these videos enough, and the [...]
25 Jan 2012 12:36pm GMT
24 Jan 2012
Planet filibeto
Blog O' Matty: The importance of keeping your storage array firmware up to date
A couple of weeks back I attempted to migrate a pair of clustered Solaris 10 servers to a new disk storage array. After rebooting into single user mode to pick up the new devices, I went to add the new quorum disk with clquorum. This resulted in both nodes panicking with the following panic string: [...]
24 Jan 2012 1:02pm GMT
23 Jan 2012
Planet filibeto
Constantin Gonzalez: I am a Mobile Sensor Network, Collecting Big Data
Don't worry, this is not a desperate attempt at SEO for my blog (although I do appreciate your likes, Tweets, RSS subscriptions and other ways you help me reach a wider audience), nor is this my entry into the latest contest of IT BS Bingo.
It just occurred to me yesterday that Big Data is everywhere. Even during your weekend jogging run.
Collecting Fitness Data, Step by Step, Heartbeat by Heartbeat, on Your Phone
For Christmas, I bought myself a Wahoo Fitness Key* and its matching ANT+ heart rate monitor (HRM)
*. The key plugs into your iPhone and provides connectivity to the ANT+ wireless sensor protocol. The HRM is another dongle that straps around your chest and electrically registers every heart beat, then transmits the data to the Wahoo key. If you have an iPhone 4S, you can do without the key and just buy a Bluetooth HRM like the Wahoo BlueHR, because iPhone 4 supports Bluetooth 4.0 which includes a low power version of the protocol that supports sensor collection devices such as HRMs that run off of a coin cell.
So iPhone + Wahoo + HRM = Wireless Sensor Network. And if your idea of a network involves more than two participants, Wahoo also sells an ANT+ pedometer* to measure your stepping frequency along with heart beat data as well.
(Android users: I'm sure you'll find a similar solution for yourselves as well. I just happen to prefer quality over popularity.)
Running 2.0
Thanks to modern gadgetry, apps like iSmoothRun on my phone can now tell me how I'm doing while I'm running, including time, distance (thanks to GPS, which is another sensor), pace, cadence (using the phone's accelerometer or a wireless pedometer*) and heart rate. I can also set up a target running profile (like "No more than 70% of max. heart rate so I can stay in the aerobic zone, please.") and my phone will duck the music and tell me to slow down whenever I go beyond target heart rate.
Pretty cool.
Social Network Running
But we live in the age of web 2.0 so there's obviously more to do if you want to maintain your running geek-cred: The iPhone also collects all data (position, heart-beats, and steps) over time and at the end of the run, it will not only present me with my running statistics, possibly spiced up with current weather data etc., it will also offer to upload the data to one of the emerging fitness social networks, such as RunKeeper.com.
Sites like Runkeeper take the data and create web maps with my running path, complete with nice graphs that I can dive into for analyzing my own running behavior including altitude, pace, heart rate, cadence etc. They also collect other data such as weight and body fat percentage (yes, using a Withings Scale* for example, you can track weight/bodyfat data too, even data from a sleep tracking system
* can be collected!) and show you your running (or fat loss) progress over time.
And thanks to social network goodness, you can run with friends over the network and compare statistics even if you're not physically running at the same time. Or the same place.
And this is where Big Data comes into play, but what is it and how does it work?
The Advent of Big Data
The first time I heard about big data was during an internal workshop about the Sun Cloud in 2009 (you know, the old Sun habit of being way before our time). While we contemplated the implications of cloud computing for enterprises, someone mentioned that this would be nothing compared to the implications of Big Data. Back then, Big Data was reserved to web giants like Google and Yahoo! and the occasional large research institute such as CERN.
Big Data is the art of handling (surprise!) large amounts of data. "Large" can be anywhere starting at a dozen of Terabytes or a couple of Petabytes or any large number that no-one in their right mind would place into a single database on a single server.
Big Data has been made popular by innovations from web companies like Google, Yahoo, Facebook or Twitter, who pioneered new ways of handling huge amounts of data.
Today, Big Data is about to cross the chasm* from the domain of a few innovators and early adopters to the early majority, as businesses start to realize its value.
The Four V's of Big Data
Big Data is typically associated with four V's:
- Volume, meaning lots and lots of data from sensors, devices, social networks, the web, retail offices, mobile fleets, etc.,
- Variety, meaning there's no predefined structure in the data that one can rely on: Unstructured data. This is the main differentiator against classic data ware-housing, which is strictly structured.
- Value, meaning that somewhere within that data, there is some valuable information to extract, though most of the pieces of data individually may seem valueless.
- Velocity, meaning quick turnaround cycles, quick, almost real-time processing and also short innovation cycles. Fail fast, fail often is the mantra, until you hit data gold.
RunKeeper and Big Data
Let's come back to our running example: RunKeeper is a Big Data company because it collects GPS, heart rate, cadence and other data from its millions of users. Assuming that only half of their 6 million users actually use the service for real, and that they run once a week and assuming a data size of 50 KByte per run (including GPS positions), we get 7.8 TByte of data per year. This is not a lot by Big Data standards, and it is quite structured, but when you combine this data with Tweets, Facebook status updates, other exercise data and nutrition/sleep data (RunKeeper does all of the above), then data volume easily increases to more than 10 TB per year, which is quite a lot to wade through.
And if you start counting records, the complexity is overwhelming: Each GPS sample is about 100 Bytes, which means that RunKeeper's 10TB per year translates into roughly 100 Billion records to correlate, analyze and create meaning from.
What meaning?
The Meaning of Big Data
And that is the goal of Big Data: To create meaning out of billions of records that seem so innocent, if looked at individually. In the RunKeeper example, they create graphs of your running history and help you analyze and optimize your fitness either for free or as a paid, "pro" service. And thanks to their Health Graph API, an eco-system of other applications and companies emerges who slice and dice RunKeeper's data in other creative ways, trying to create valuable (and monetizable) meaning out of it. Example: World-Rank.in collects data from RunKeeper and Twitter, then ranks runners into its own top 30 lists.
Other companies use Big Data to identify patterns in their customer's behavior, find threats or opportunities to act upon, or simply alert hospitals that a new flu epidemic is about to hit them.
How Big Data Works
Most Big Data use cases work around the same pattern:
- Aquire: Data is collected. Speed and scalability is critical here, not necessarily high availability. It's ok to be offline for a few minutes, even hours, or to lose the occasional data record, but it's important to catch as much as possible. The Hadoop framework and file system and/or NoSQL databases are key tools at this step.
- Organize: To make large amounts of data manageable, a divide and conquer approach is taken: Data is mapped to some interesting nomenclature/metric/attribute (For example: is this a positive of a negative tweet? What company is this tweet about? Was that run a new record or a below-average result?), then the data is reduced into a more condense form ("Number of negative tweets that mention our company", "Fastest 10k runs per country", etc.). These two steps are the key in the MapReduce framework and can be used repeatedly and creatively to compose a new, more valuable data source, like: "Top 10 keywords associated with negative feelings towards your company.", or "Top runners per age category and country", or "Best training improvement over the last year", or even crazy stuff like "Fastest music to run with".
- Analyze: With that kind of data and processing at your fingertips, new kinds of insight are possible that can be analyzed and acted upon. Which flights are delayed and carry unhappy social media celebrities so you better take good care of them? How about sponsoring your top runner in a certain category and create a celebrity out of her/him? Of course, valuable analysis can also be monetized. Perhaps some research institute or some sports company is interested in getting access to all that heart beat, speed and nutrition data (anonymized of course)?
Oracle and Big Data
Don't worry, this commercial break will be brief, but interesting:
Oracle's big strengths of course are in handling commercial data warehouses and analyzing business information data, as well as building Engineered Systems that remove the pain of setting up an IT shop while optimizing the usefulness you get from your systems.
Big Data's strength lies in its innovation to handle and organize unstructured, large data sets, through the Hadoop filesystem, the MapReduce framework, the R statistical language and other emerging technologies. But analyzing data after these steps is still in it infancy.
By combining the worlds of Big Data, Data Warehousing and Business Intelligence, running on Engineered Systems, Oracle can offer unique value to businesses who want to leverage Big Data for their benefit, without going through the trial/error/research of running their own Big Data development operations.
Learn more from Oracle's Big Data White Paper, it's really good, and check out Oracle's Big Data home page.
Building your own Sensor Driven Big Data Collection Network
As you can see, Big Data is fun and healthy. Here are some gadgets* to get your own Sensor Network based Big Data collection infrastructure set up that feeds into RunKeeper and other Big Data collecting social networks for your analytical pleasure:
Big Data and You
What are your favorite Big Data examples? Do you see Big Data being used in your company? Have you played with collecting, organizing and analyzing Big Data yourself? Leave a comment and share!
Finally, here's a video that shows the beauty of collecting, organizing, analyzing and visualizing of Big Data:
And if you want to see my own small chunks of running data, feel free to join my Street Team on RunKeeper.
Disclaimer: Neither me nor Oracle are affiliated with RunKeeper (Not that I know of). I just think it's a cool service.
*Disclosure: Some product links in this article are affiliate links. If you buy through them, I'll get a small kickback to help with hosting costs for this blog at no extra charge to you.
23 Jan 2012 3:40pm GMT
Joerg Moellenkamp: Oracle Solaris 11 Tech Days 2012
Im Februar läuft eine Reihe von Events in Deutschland und der Schweiz zum Thema Solaris 11. Die Events versprechen technisch sehr interessant werden, da die Sprecher jeweils sehr tief in der Materie sind. Über Detlef Drewanz - der bei allen Events dabei ist - muss ich seit dem Containerleitfaden genauso wie über Uli Gräf (der an einigen, aber nicht allen Orten spricht) wohl nichts mehr sagen. Christian Christian Ritzka und Elke Freymann sind ausgewiesene Experten zum Thema OpsCenter. Und ja .. ich halte auch einen Vortrag über Datamanagement in Solaris 11. Und da ich schon zweimal die Frage gesehen habe: Die Veranstaltung ist kostenfrei ![]()
| 08:30 - 09:00 | Registrierung | |
| 09:00 - 09:15 | Begrüßung | |
| 09:15 - 10:00 |
Was ist neu in Oracle Solaris 11 |
|
| 10:00 - 11:00 |
Oracle Solaris 11 Installation |
|
| 11:00 - 11:30 |
Pause |
|
| 11:30 - 12:30 | Oracle Virtualisierung In Oracle Solaris 11 sind umfangreiche Virtualisierungstechniken integriert. Lernen Sie alles über die neue Netzwerk Virtualisierung in Oracle Solaris 11 und wie sie komplette multi-tier HW Infrastrukturen in einer einzelnen Maschine zusammen mit dem Oracle Virtual Machine framework und Solaris Zonen realisiert werden kann. |
|
| 12:30 - 13:30 | Mittagessen | |
| 13:30 - 14:15 | Management von IT Infrastrukturen Virtualisierung heist nicht nur "Hypervisor". In diesem Vortrag zeigen wir, wie sich virtualisierte Oracle Solaris 11 Umgebungen zentral verwalten lassen. |
|
| 14:15 - 14:45 | Das Solaris Schulungsprogramm Oracle University stellt zusammen mit unseren Schulungspartnern ein umfassendes Programm zur Vertiefung von Solaris Wissen zur Verfügung. In diesem Vortrag werden die Ausbildungpfade, Kurse und Zertifizierungen für Solaris 11 beleuchtet und verfügbare Lernformen vorgestellt. |
|
| 14:45 - 15:15 |
Pause |
|
| 15:15 - 15:45 |
Oracle Solaris 11 Datamanagement |
|
| 15:45 - 16:15 |
Panel, Q&A |
|
| 16:15 - 16:45 |
Erfrischungen, Zeit zur Diskussion mit den Experten |
|
Die genaue Agenda mit den Sprechern in den einzelnen Orten und eine Möglichkeit zur Anmeldung findet ihr auf den Eventseiten:
- Düsseldorf 6.2.2012
- Stuttgart 8.2.2012
- Hamburg 9.2.2012
- Potsdam 10.2.2012
- Frankfurt 13.2.2012
- München 14.2.2012
- Zürich 28.2.2012
Um zahlreiches Erscheinen wird gebeten! ![]()
23 Jan 2012 2:18pm GMT
Blog O' Matty: How to figure out if a processes has been chroot()’ed
A number of applications (e.g., custom chroot jails, openssh, vsftp, apache) support the ability to chroot themselves. To find out if a process called chroot() at startup, you can check the /proc/<pid>/root entry for the process. For non-chrooted processes this entry will point to /: $ ps auxwww | grep [s]endmail root 3643 0.0 0.1 [...]
23 Jan 2012 1:07pm GMT
