Friday, July 27, 2012

Loading protobuf format file into pig script using loadfunc pig UDF


Twitter's open source library elephant bird has many such loaders:https://github.com/kevinweil/elephant-bird
You can use LzoProtobufB64LinePigLoader and LzoProtobufBlockPigLoader.https://github.com/kevinweil/elephant-bird/tree/master/src/java/com/twitter/elephantbird/pig/load
To use it, you just need to do:
1) Build the elephant-bird using maven.
2) Register the elephant-bird's Core and Pig jar into your pig scripts
3) Load the lzo Files using com.twitter.elephantbird.pig.load.LzoTokenizedLoader class.
Example:
REGISTER /root/pig/eleplant-bird/elephant-bird-core-3.0.2.jar;
REGISTER /root/pig/eleplant-bird/elephant-bird-pig-3.0.2.jar;
A = LOAD '/root/files' USING com.twitter.elephantbird.pig.load.LzoTokenizedLoader('|');
STORE A INTO '/root';

Monday, July 16, 2012

PHP Time Diff

How to calculate minute difference between two date-times in PHP?


Here is the answer:

$to_time = strtotime("2008-12-13 10:42:00");
$from_time = strtotime("2008-12-13 10:21:00");
echo round(abs($to_time - $from_time) / 60,2). " minute";

Tuesday, July 10, 2012

Schedule tasks on Linux using crontab


Schedule tasks on Linux using crontab

If you've got a website that's heavy on your web server, you might want to run some processes like generating thumbnails or enriching data in the background. This way it can not interfere with the user interface. Linux has a great program for this called cron. It allows tasks to be automatically run in the background at regular intervals. You could also use it to automatically create backups, synchronize files,schedule updates, and much more. Welcome to the wonderful world of crontab.

Crontab

The crontab (cron derives from chronos, Greek for time; tab stands for table) command, found in Unix and Unix-like operating systems, is used to schedule commands to be executed periodically. To see what crontabs are currently running on your system, you can open a terminal and run:
sudo crontab -l
To edit the list of cronjobs you can run:
sudo crontab -e
This wil open a the default editor (could be vi or pico, if you want you can change the default editor) to let us manipulate the crontab. If you save and exit the editor, all your cronjobs are saved into crontab. Cronjobs are written in the following format:
* * * * * /bin/execute/this/script.sh

Scheduling explained

As you can see there are 5 stars. The stars represent different date parts in the following order:
  1. minute (from 0 to 59)
  2. hour (from 0 to 23)
  3. day of month (from 1 to 31)
  4. month (from 1 to 12)
  5. day of week (from 0 to 6) (0=Sunday)

Execute every minute

If you leave the star, or asterisk, it means every. Maybe that's a bit unclear. Let's use the the previous example again:
* * * * * /bin/execute/this/script.sh
They are all still asterisks! So this means execute /bin/execute/this/script.sh:
  1. every minute
  2. of every hour
  3. of every day of the month
  4. of every month
  5. and every day in the week.
In short: This script is being executed every minute. Without exception.

Execute every Friday 1AM

So if we want to schedule the script to run at 1AM every Friday, we would need the following cronjob:
0 1 * * 5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system clock hits:
  1. minute: 0
  2. of hour: 1
  3. of day of month: * (every day of month)
  4. of month: * (every month)
  5. and weekday: 5 (=Friday)

Execute on workdays 1AM

So if we want to schedule the script to Monday till Friday at 1 AM, we would need the following cronjob:
0 1 * * 1-5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system clock hits:
  1. minute: 0
  2. of hour: 1
  3. of day of month: * (every day of month)
  4. of month: * (every month)
  5. and weekday: 1-5 (=Monday til Friday)

Execute 10 past after every hour on the 1st of every month

Here's another one, just for practicing
10 * 1 * * /bin/execute/this/script.sh
Fair enough, it takes some getting used to, but it offers great flexibility.

Neat scheduling tricks

What if you'd want to run something every 10 minutes? Well you could do this:
0,10,20,30,40,50 * * * * /bin/execute/this/script.sh
But crontab allows you to do this as well:
*/10 * * * * /bin/execute/this/script.sh
Which will do exactly the same. Can you do the the math? ;)

Special words

If you use the first (minute) field, you can also put in a keyword instead of a number:
@reboot     Run once, at startup
@yearly     Run once  a year     "0 0 1 1 *"
@annually   (same as  @yearly)
@monthly    Run once  a month    "0 0 1 * *"
@weekly     Run once  a week     "0 0 * * 0"
@daily      Run once  a day      "0 0 * * *"
@midnight   (same as  @daily)
@hourly     Run once  an hour    "0 * * * *
Leave the rest of the fields empty so this would be valid:
@daily /bin/execute/this/script.sh

Storing the crontab output

By default cron saves the output of /bin/execute/this/script.sh in the user's mailbox (root in this case). But it's prettier if the output is saved in a separate logfile. Here's how:
*/10 * * * * /bin/execute/this/script.sh 2>&1 >> /var/log/script_output.log

Explained

Linux can report on different levels. There's standard output (STDOUT) and standard errors (STDERR). STDOUT is marked 1, STDERR is marked 2. So the following statement tells Linux to store STDERR in STDOUT as well, creating one datastream for messages & errors:
2>&1
Now that we have 1 output stream, we can pour it into a file. Where > will overwrite the file, >> will append to the file. In this case we'd like to to append:
>> /var/log/script_output.log

Mailing the crontab output

By default cron saves the output in the user's mailbox (root in this case) on the local system. But you can also configure crontab to forward all output to a real email address by starting your crontab with the following line:
MAILTO="yourname@yourdomain.com"

Mailing the crontab output of just one cronjob

If you'd rather receive only one cronjob's output in your mail, make sure this package is installed:
aptitude install mailx
And change the cronjob like this:
*/10 * * * * /bin/execute/this/script.sh 2>&1 | mail -s "Cronjob ouput" yourname@yourdomain.com

Trashing the crontab output

Now that's easy:
*/10 * * * * /bin/execute/this/script.sh 2>&1 > /dev/null
Just pipe all the output to the null device, also known as the black hole. On Unix-like operating systems,/dev/null is a special file that discards all data written to it.

Tag: crontab


Tag: crontab

About

The crontab command, found in Unix and Unix-like operating systems, is used to schedule commands to be executed periodically. It reads a series of commands from standard input and collects them into a file also known as a "crontab" which is later read and whose instructions are carried out. The name is derived from Greek chronos (χρόνος), meaning time.

Generally, the schedules modified by crontab are enacted by a daemon, crond, which runs constantly in the background and checks once a minute to see if any of the scheduled jobs need to be executed. If so, it executes them. These jobs are generally referred to as cron jobs.


Login automatically with SSH keys


Login automatically with SSH keys

With SSH you can securely login to any Linux server and execute commands remotely. You can even use SSH to transfer and synchronize files from one server to another. Automating these tasks can make your life easier, but normally SSH prevents that because it requires you to login every time. Well, not anymore, in this article I will show you how to connect to SSH without a password.

About SSH keys

SSH keys allow machines to identify each other without you having to type the password every time. First we need to generate a key (it's nothing more than a randomly generated sequence of bytes, see it as a fingerprint) on the machine you're going to make the connection from. And then you install that unique key on the machine that needs to accept the connection.

Little helper script

Installing keys takes quite a couple of commands, not very easy to remember either. And if you have multiple servers, you might even want to automate the process of installing keys. No worries, I did this for you. So just download the helper script and install it. Open a terminal, and type:
su -  # If you're going to use the keys to automate tasks, become root first
mkdir -p ~/bin
wget -O- "https://github.com/kvz/kvzlib/raw/master/bash/programs/instkey.sh" > ~/bin/instkey.bash
chmod 755 ~/bin/instkey.bash

Running the script: installing keys

Now with the script in place, installing SSH keys is easy. To allow easy access to server.example.com just open a terminal and type:
~/bin/instkey.bash server.example.com
The first time you run the script, it will create the necessary keys, when it asks for a pass phrase, just hit enter. Then it logs in at server.example.com (now you need to enter the server's password for the last time ;), and it saves the key.

Installing ssh keys under a different user

Make sure you are logged in as the user you want to have passwordless ssh access. Let's say this user is called: kevin.
Goto the place you downloaded the instkey.sh script to, and type:
./instkey.bash server.example.com kevin
Notice the second argument? This will make sure keys from kevin aren't remotely installed to root, but to kevin as well. Easy right?
**Congratulations! **You can now type
ssh server.example.com
And you'll be logged in right away! Another great idea is to use this technology to automatically synchronize files with rsync.

Pitfalls

  • Of course you should really be carefull where and when to install ssh keys, because if one machine is comprimised, it's very easy for a cracker to hop to the next system without logging in. So choose wisely when to use this technology.
  • Keys are user user specific. So if you're going to run programs as root that need to automatically login to systems, you must also install the key as root.

Tag: SSH key


Tag: SSH key

About

A randomly generated sequence of bytes, that machines use to identify eachother with over SSH. See it as a fingerprint.