Rotating Videos and Preserving Metadata

So this is a pet peeve that’s been biting me for a sometime.

Often when you take a picture with you phone or digital camera, the camera rotation sensor (gyroscope?) gets it wrong and ends  up taking a picture that is sideways (ie, rotated +/- 90 degrees).
Surely enough you can use your phone’s photo library app to rotate it and job done, sanity restored.

However, the really really annoying thing is when you camera gets the orientation of a video wrong. Rotating a video is just not a functionality that the photo library offers off the shelf. That’s because rotating a video involves actually re-enconding it from the scratch. Some nice video players (eg, VLC) allow you to rotate while playing the video, but this is fiddly and hard to remember to do every time you play the video.

So in any case, here are a few bash lines that I cobbled together to rotate a batch of videos trying to preserve as much of the metadata information as possible.

for i in $( find *.mov ); do
  IN=$i OUT=${IN}.m4v && \
  ffmpeg -i ${IN} -c:a copy -map_metadata 0:s:0  -vf "transpose=1"  -c:v libx264 -crf 23 -preset medium ${OUT} && \
  exiftool -tagsfromfile ${IN} -makernotes -make -model ${OUT} && \
  exiftool ${OUT} && \
  rm *_original

NOTE: To run these you will need to have ffmpeg and exiftool installed in your system.

Posted in Programming, Software | Tagged , , , , , | Leave a comment

Bash Shell $PS1 Configuration

After a recent talk with Jim Meyering I’ve decided to finally organise a bit my .bashrc and all my .dot_files in general.

So first and foremost important change, track all .dot_files in some for of version control system. I’m right now using Mercurial powered by BitBucket. Git is also a great choice. Go with whatever you are comfortable with, just make sure you don’t lose your precious configs and that you can easily synchronise all you unix/linux/bsd boxes effortlessly.

The other big take for this talk was to make sure your $PS1 shell prompt gives you the right information. Two key things that are absolute gold to have:

  1. The branch/bookmark you are currently in if you are in a VCS directory.
  2. The exit code of the previous command if it returned an error (different from 0).

Here’s how my current $PS1 looks like:

.bashrc $PS1

And this is my current .bashrc $PS1 configuration:

# Decorate $PS1
function __get_vcs {
local path=`pwd`
while true; do
if [[ -d "${path}/.hg" ]]; then
echo "mercurial"
elif [[ -d "${path}/.git" ]]; then
echo "git"
elif [[ "${path}" = "/" ]]; then
echo "none"
path=`cd ${path}/../ && pwd`

function __get_vcs_branch {
case "$vcs" in
out=`hg id -b`
out=" (hg:${out})"
out=`git rev-parse --abbrev-ref HEAD`
out=" (git:${out})"
echo "$out"

function __get_exit_code {
local code="$?"
local msg=''
if [ $code != 0 ]; then
msg="[${code}] "
echo "$msg"

export PS1="${red}\$(__get_exit_code)${blue}\t ${green}\W${purple}\$(__get_vcs_branch)${blue} \$${black} "

Finally, here are some other things that could be interesting to display:

  • \u – Username
  • \h – Hostname
  • \w – Full path of the current working directory

PS: Word of advice, it’s very easy to get carried away and try to add ‘the world information’ to your $PS1. Up to you what you value the most.

Posted in Programming | Tagged , , , | Leave a comment

Cave Story

cave_storyI thought I’d write a quick one about the amazing game Cave Story turned out to be. Takes you back to the old school 2D platformers with quite a high dosage of difficulty and at times the mandatory frustration. Having finished it in easy mode I can say that the final bosses in other modes must take insane amounts of mastery.

But anyway, the point of this post was not so much about the game but the fact Daisuke Amaya single handedly wrote the story, the music, the cinematics, the graphics and coded it all – this all over the course of 5 years working on his spare/free time (as he held a fulltime job). 5 years is a long time to be hammering at the same project so I can only imagine it took a lot of perseverance (read stubbornness) to stick with the project and execute it to completion. Hats off to Daisuke. 🙂

Posted in Uncategorized | Leave a comment

Japan Funny Signs

Japan has really awesome sites to visit, great food and the people are extremely friendly and helpful. In the spirit of keeping tourists even more entertained, Japanese signs introduce subtle ‘easter eggs’ into their English translations. Here are some I found mildly amusing.

We’ll start with this very friendly ash cloud on a sign stuck to the floor right before a pedestrian crossing. Almost makes you wanna smoke to get your cutie genie cloud out of your bottle/cigarette. Japan Funny Pics - 01

This should probably also be enforced at local theater’s. I was also informed that “handy phone” is how germans refer to cell phones – funny the Japanese would go to a German to translate into English.

Japan Funny Pics - 02

This one is also commonly known as “palmier” but I like it they are descriptive of their pastries. Unfortunately often the show cased pastries have nothing to do with what’s inside the box.

Japan Funny Pics - 03

Were it not for the foggy cloudy weather and a gorgeous view of Mount Fuji would present itself at the top of this cable car – I mean ‘rorpeway’.

Japan Funny Pics - 04

Now that we are clear that we are taking the ‘rorpeway’ service we should get ready to head to the ‘platfome’.

Japan Funny Pics - 05

And now that we are done with our tour and the weather was so miserable that Mount Fuji was nowhere to be seen we shall go back down the ‘rorpeway’ onto our favorite ‘platfome’ and it’s promised to be ‘wicket’. That or we are soon to attend a cricket match.

Japan Funny Pics - 06

And just as we exit, Sherlock Holmes gives us a heads up by telling us to “watch our steps” – any wrong move and we are back in jail.

Japan Funny Pics - 07

So here I am right after hotel checkout. The hotel employee kindly asked me to confirm all the expenses at the hotel and once I acknowledged and paid it he professionally stamped the receipt as ‘Recieved by credit’ – a whole new level having hotel stamps with typos! One can never lower his guard.

Japan Funny Pics - 08

So the Bullet Train – Shinkansen seemed to have a rather complete and long manual on how to deal with emergencies… and they almost made it through – the final word let them down from perfection.

Japan Funny Pics - 09

Time for some snack and I have to say they went rather bold at this pastry shop using such expensive tricky words to spell – and they almost made it through…

Japan Funny Pics - 10

At a temple in Kyoto, I was already well aware of the challenges that the word ‘ticket’ presents so it came as no surprise to me the following ‘tiket’ price sign nailed to one of the temples walls.

Japan Funny Pics - 11

And free I indeed felt.

Japan Funny Pics - 12

This time around I was at a huge mall in Tokyo Station and when I glimpsed such a big sign with an English translation I instantly knew they were asking for trouble… (“Reataurant”, “Clothng”)

Japan Funny Pics - 13 Japan Funny Pics - 14

Finally, I had to dispose my coke bottle somewhere and what better place to do so than the dust boxes at the entrance of the mall. It must be a cultural thing but I often visit the mall without a hoover or a broom so I used these boxes to what was most convenient at the time.

Japan Funny Pics - 15


Still a couple more days until the trip is done so keep tuned for more pics… 🙂

Posted in Funny, Photos, Sightseeing

Hadoop/Hive – Writing a Custom SerDe (Part 1)

(Special thanks to Denny Lee for reviewing this)

Apache Hadoop is an open source project composed of several software solutions for distributed computing. One interesting component is Apache Hive that lets one leverage the MapReducing powers of Hadoop through the simple interface of HiveQL language. This language is basically a SQL lookalike that triggers MapReduces for operations that are highly distributable over huge datasets.

A common hurdle once a company decides to use Hadoop and Hive is: “How do we make Hadoop understand our data formats.”. This is where the Hadoop SerDe terminology kicks in. SerDe is nothing but a short form for Serialization/Deserialization. Hadoop makes available quite a few Java interfaces in its API to allow users to write their very own data format readers and writers.

Step by step, one can make Hadoop and Hive understand new data formats by:
1) Writing format readers and writers in Java that call Hadoop APIs.
2) Packaging all code in a java library – eg., MySerDe.jar.
3) Adding the jar to the Hadoop installation and configuration files.
4) Creating Hive tables and explicitly set the input format, the output format and the row format.

Before diving into the Hadoop API and Java code it’s important to explain what really needs to be implemented. For concisiveness of terms I shall refer to row as the individual unit of information that will be processed. In the case of good old days SQL databases this indeed maps to a table row. However our datasource can be something as simple as Apache logs. In that case, a row would be a single log line. Other storage types might take complex message formats like Protobuf Messages, Thrift Structs, etc… For any of these, think of the top-level struct as our row. What’s common between them all is that inside each row there will be sub-fields (columns), and those will have specific types like integer, string, double, map, …

So going back to our SerDe implemention, the first thing that will be required is the row deserializer/serializer (RowSerDe). This java class will be in charge of mapping our row structure into Hive’s row structure. Let’s say each of our rows corresponds to a java class (ExampleCustomRow) with the three fields:

  1. int id;
  2. string description;
  3. byte[] payload;

The RowSerDe should be able to mirror this row class and their properties into Hive’s ObjectInspector interface. For each of our types it’ll find and return the equivalent type in the Hive API. Here’s the output of our RowSerDe for this example:

  1. int id -> JavaIntObjectInspector
  2. string description -> JavaStringObjectInspector
  3. byte[] payload -> JavaBinaryObjectInspector
  4. class ExampleCustomRow -> StructObjectInspector

In the example above, the row structure is very flat but for examples where our class contains others classes and so forth, the RowSerDe needs to be able to recursively reflect the whole structure into Hive API objects.

Once we have a way of mapping our rows into hadoop rows, we need to provide a way for hadoop to read our files or databases that contain multiple rows and extract them one by one. This is done via de Input and Output format APIs. A simple format for storing multiple rows in a file would be separating them by newline characters (like comma separated files do). An Input reader in this case would need to know how to read a byte stream and single out byte arrays of individual lines that would later be fed into to our custom SerDe class.

As you can probably imagine by now, the Output writer needs to do exactly the opposite: it receives the bytes that corresponds to each line and it knows how to append them and separate them (by newline characters) in the output byte stream.

How an Hadoop MapReduce interacts with a custom SerDe for Hive.

How an Hadoop MapReduce interacts with a custom SerDe for Hive.

Summarizing, in order to implement a complete SerDe one needs to implement:
1) The Hive Serde interface (contains both the Serializer and Deserializer interfaces).
2) Implement the InputFormat interface and the OutputFormat interface.

In the next post I’ll take a deep dive into the actual Hadoop/Hive APIs and Java code.
(Two years have gone by and I unfortunately never got round to writing anything else. Probably, anything I would write now would be outdated so I would encourage anyone who has questions to try to ping me directly or just ask directly the the hive community)

Posted in Programming, Software, Tech | Tagged , , , | 1 Comment

Counting lines in files

Here’s silly (and long) Python script I wrote a while back to count the number of lines in files recursively through a dir. (I guess I was fed up with ‘find ../ -iname “*cpp” | xargs wc -l” or just utterly bored or using Windows…)

# Usage:
#   ./ [path_to_dir_or_file]

__author__ = "Rui Barbosa Martins ("

import optparse
import os.path
import sys

def getNumberOfLines(filePath):
  fp = file(filePath, "r");
  numberOfLines = len(fp.readlines())
  return numberOfLines

def visit((fileTypesIncluded, files), dirname, names):
  for f in names:    
    filePath = os.path.join(dirname, f)
    if (fileTypesIncluded == ["*"] or 
        filter(lambda fe: f.endswith(fe), fileTypesIncluded)) and
      files[filePath] = getNumberOfLines(filePath)

def main(startPath, fileTypesIncluded):
  if not os.path.exists(startPath):
    print "Error: Path [%s] does not exist." % (startPath)
  elif os.path.isfile(startPath):
    files = {startPath: getNumberOfLines(startPath)}
    fileTypes = "|".join(fileTypesIncluded)
    absPath = os.path.abspath(startPath)
    print "Searching for extensions '%s' in '%s'." % (fileTypes, absPath)    
    files = {}
    os.path.walk(startPath, visit, (fileTypesIncluded, files))

  keys = files.keys()
  total = 0
  longestLength = 0
  groups = {}

  for key in keys:
    if longestLength < len(key):
      longestLength = len(key)      

  for key in keys:
    spaces = " " * (longestLength - len(key))
    print "%s%s %d" % (key, spaces , files[key])
    total = total + files[key]

  if len(keys) > 1:
    print "Total lines of code: %d" % (total)

def parseArgs():
  usage = "Usage: %prog [Options] FILE_OR_DIRECTORY"
  parser = optparse.OptionParser(usage=usage)
  parser.add_option("-f", "--file_extensions", dest="file_extensions",
                    help="File extensions to count in. (comma separated)", 
  (options, args) = parser.parse_args()
  fe = options.file_extensions
  if not args:
    path = "."
  elif len(args) == 1:
    path = args[0]
    print "Error: Too many paths provided. Only one expected."

  if type(fe) == str:
    fe = map(lambda s: s.strip(), fe.split(","))
  if not fe or filter(lambda s: "*" in s, fe):
    fe = ["*"]
  return path, fe

if __name__ == "__main__":

Posted in Programming, Software | Leave a comment

Como reparar o seu computador

Há uns tempos atrás o venerável jovem do xkcd teve um rasgo de brilhantismo ao descrever no post abaixo a típica situação pela qual o comum informático passa vezes sem conta.

Como achei que era uma perda enorme tão bom diagrama não chegar à comunidade não informática portuguesa, onde o diagrama seria claramente mais útil resolvi traduzi-lo para Português. Aqui  vai o link para uma imagem com resolução aceitável para imprimirem e colocarem na parede do vosso escritório.


Posted in Tech | Leave a comment