Cave Story

cave_storyI thought I’d write a quick one about the amazing game Cave Story turned out to be. Takes you back to the old school 2D platformers with quite a high dosage of difficulty and at times the mandatory frustration. Having finished it in easy mode I can say that the final bosses in other modes must take insane amounts of mastery.

But anyway, the point of this post was not so much about the game but the fact Daisuke Amaya single handedly wrote the story, the music, the cinematics, the graphics and coded it all – this all over the course of 5 years working on his spare/free time (as he held a fulltime job). 5 years is a long time to be hammering at the same project so I can only imagine it took a lot of perseverance (read stubbornness) to stick with the project and execute it to completion. Hats off to Daisuke. :)

Posted in Uncategorized | Leave a comment

Japan Funny Signs

Japan has really awesome sites to visit, great food and the people are extremely friendly and helpful. In the spirit of keeping tourists even more entertained, Japanese signs introduce subtle ‘easter eggs’ into their English translations. Here are some I found mildly amusing.

We’ll start with this very friendly ash cloud on a sign stuck to the floor right before a pedestrian crossing. Almost makes you wanna smoke to get your cutie genie cloud out of your bottle/cigarette. Japan Funny Pics - 01

This should probably also be enforced at local theater’s. I was also informed that “handy phone” is how germans refer to cell phones – funny the Japanese would go to a German to translate into English.

Japan Funny Pics - 02

This one is also commonly known as “palmier” but I like it they are descriptive of their pastries. Unfortunately often the show cased pastries have nothing to do with what’s inside the box.

Japan Funny Pics - 03

Were it not for the foggy cloudy weather and a gorgeous view of Mount Fuji would present itself at the top of this cable car – I mean ‘rorpeway’.

Japan Funny Pics - 04

Now that we are clear that we are taking the ‘rorpeway’ service we should get ready to head to the ‘platfome’.

Japan Funny Pics - 05

And now that we are done with our tour and the weather was so miserable that Mount Fuji was nowhere to be seen we shall go back down the ‘rorpeway’ onto our favorite ‘platfome’ and it’s promised to be ‘wicket’. That or we are soon to attend a cricket match.

Japan Funny Pics - 06

And just as we exit, Sherlock Holmes gives us a heads up by telling us to “watch our steps” – any wrong move and we are back in jail.

Japan Funny Pics - 07

So here I am right after hotel checkout. The hotel employee kindly asked me to confirm all the expenses at the hotel and once I acknowledged and paid it he professionally stamped the receipt as ‘Recieved by credit’ – a whole new level having hotel stamps with typos! One can never lower his guard.

Japan Funny Pics - 08

So the Bullet Train – Shinkansen seemed to have a rather complete and long manual on how to deal with emergencies… and they almost made it through – the final word let them down from perfection.

Japan Funny Pics - 09

Time for some snack and I have to say they went rather bold at this pastry shop using such expensive tricky words to spell – and they almost made it through…

Japan Funny Pics - 10

At a temple in Kyoto, I was already well aware of the challenges that the word ‘ticket’ presents so it came as no surprise to me the following ‘tiket’ price sign nailed to one of the temples walls.

Japan Funny Pics - 11

And free I indeed felt.

Japan Funny Pics - 12

This time around I was at a huge mall in Tokyo Station and when I glimpsed such a big sign with an English translation I instantly knew they were asking for trouble… (“Reataurant”, “Clothng”)

Japan Funny Pics - 13 Japan Funny Pics - 14

Finally, I had to dispose my coke bottle somewhere and what better place to do so than the dust boxes at the entrance of the mall. It must be a cultural thing but I often visit the mall without a hoover or a broom so I used these boxes to what was most convenient at the time.

Japan Funny Pics - 15


Still a couple more days until the trip is done so keep tuned for more pics… :)

Posted in Funny, Photos, Sightseeing

Hadoop/Hive – Writing a Custom SerDe (Part 1)

(Special thanks to Denny Lee for reviewing this)

Apache Hadoop is an open source project composed of several software solutions for distributed computing. One interesting component is Apache Hive that lets one leverage the MapReducing powers of Hadoop through the simple interface of HiveQL language. This language is basically a SQL lookalike that triggers MapReduces for operations that are highly distributable over huge datasets.

A common hurdle once a company decides to use Hadoop and Hive is: “How do we make Hadoop understand our data formats.”. This is where the Hadoop SerDe terminology kicks in. SerDe is nothing but a short form for Serialization/Deserialization. Hadoop makes available quite a few Java interfaces in its API to allow users to write their very own data format readers and writers.

Step by step, one can make Hadoop and Hive understand new data formats by:
1) Writing format readers and writers in Java that call Hadoop APIs.
2) Packaging all code in a java library – eg., MySerDe.jar.
3) Adding the jar to the Hadoop installation and configuration files.
4) Creating Hive tables and explicitly set the input format, the output format and the row format.

Before diving into the Hadoop API and Java code it’s important to explain what really needs to be implemented. For concisiveness of terms I shall refer to row as the individual unit of information that will be processed. In the case of good old days SQL databases this indeed maps to a table row. However our datasource can be something as simple as Apache logs. In that case, a row would be a single log line. Other storage types might take complex message formats like Protobuf Messages, Thrift Structs, etc… For any of these, think of the top-level struct as our row. What’s common between them all is that inside each row there will be sub-fields (columns), and those will have specific types like integer, string, double, map, …

So going back to our SerDe implemention, the first thing that will be required is the row deserializer/serializer (RowSerDe). This java class will be in charge of mapping our row structure into Hive’s row structure. Let’s say each of our rows corresponds to a java class (ExampleCustomRow) with the three fields:

  1. int id;
  2. string description;
  3. byte[] payload;

The RowSerDe should be able to mirror this row class and their properties into Hive’s ObjectInspector interface. For each of our types it’ll find and return the equivalent type in the Hive API. Here’s the output of our RowSerDe for this example:

  1. int id -> JavaIntObjectInspector
  2. string description -> JavaStringObjectInspector
  3. byte[] payload -> JavaBinaryObjectInspector
  4. class ExampleCustomRow -> StructObjectInspector

In the example above, the row structure is very flat but for examples where our class contains others classes and so forth, the RowSerDe needs to be able to recursively reflect the whole structure into Hive API objects.

Once we have a way of mapping our rows into hadoop rows, we need to provide a way for hadoop to read our files or databases that contain multiple rows and extract them one by one. This is done via de Input and Output format APIs. A simple format for storing multiple rows in a file would be separating them by newline characters (like comma separated files do). An Input reader in this case would need to know how to read a byte stream and single out byte arrays of individual lines that would later be fed into to our custom SerDe class.

As you can probably imagine by now, the Output writer needs to do exactly the opposite: it receives the bytes that corresponds to each line and it knows how to append them and separate them (by newline characters) in the output byte stream.

How an Hadoop MapReduce interacts with a custom SerDe for Hive.

How an Hadoop MapReduce interacts with a custom SerDe for Hive.

Summarizing, in order to implement a complete SerDe one needs to implement:
1) The Hive Serde interface (contains both the Serializer and Deserializer interfaces).
2) Implement the InputFormat interface and the OutputFormat interface.

In the next post I’ll take a deep dive into the actual Hadoop/Hive APIs and Java code.
(Two years have gone by and I unfortunately never got round to writing anything else. Probably, anything I would write now would be outdated so I would encourage anyone who has questions to try to ping me directly or just ask directly the the hive community)

Posted in Programming, Software, Tech | Tagged , , ,

Counting lines in files

Here’s silly (and long) Python script I wrote a while back to count the number of lines in files recursively through a dir. (I guess I was fed up with ‘find ../ -iname “*cpp” | xargs wc -l” or just utterly bored or using Windows…)

# Usage:
#   ./ [path_to_dir_or_file]

__author__ = "Rui Barbosa Martins ("

import optparse
import os.path
import sys

def getNumberOfLines(filePath):
  fp = file(filePath, "r");
  numberOfLines = len(fp.readlines())
  return numberOfLines

def visit((fileTypesIncluded, files), dirname, names):
  for f in names:    
    filePath = os.path.join(dirname, f)
    if (fileTypesIncluded == ["*"] or 
        filter(lambda fe: f.endswith(fe), fileTypesIncluded)) and
      files[filePath] = getNumberOfLines(filePath)

def main(startPath, fileTypesIncluded):
  if not os.path.exists(startPath):
    print "Error: Path [%s] does not exist." % (startPath)
  elif os.path.isfile(startPath):
    files = {startPath: getNumberOfLines(startPath)}
    fileTypes = "|".join(fileTypesIncluded)
    absPath = os.path.abspath(startPath)
    print "Searching for extensions '%s' in '%s'." % (fileTypes, absPath)    
    files = {}
    os.path.walk(startPath, visit, (fileTypesIncluded, files))

  keys = files.keys()
  total = 0
  longestLength = 0
  groups = {}

  for key in keys:
    if longestLength < len(key):
      longestLength = len(key)      

  for key in keys:
    spaces = " " * (longestLength - len(key))
    print "%s%s %d" % (key, spaces , files[key])
    total = total + files[key]

  if len(keys) > 1:
    print "Total lines of code: %d" % (total)

def parseArgs():
  usage = "Usage: %prog [Options] FILE_OR_DIRECTORY"
  parser = optparse.OptionParser(usage=usage)
  parser.add_option("-f", "--file_extensions", dest="file_extensions",
                    help="File extensions to count in. (comma separated)", 
  (options, args) = parser.parse_args()
  fe = options.file_extensions
  if not args:
    path = "."
  elif len(args) == 1:
    path = args[0]
    print "Error: Too many paths provided. Only one expected."

  if type(fe) == str:
    fe = map(lambda s: s.strip(), fe.split(","))
  if not fe or filter(lambda s: "*" in s, fe):
    fe = ["*"]
  return path, fe

if __name__ == "__main__":

Posted in Programming, Software | Leave a comment

Como reparar o seu computador

Há uns tempos atrás o venerável jovem do xkcd teve um rasgo de brilhantismo ao descrever no post abaixo a típica situação pela qual o comum informático passa vezes sem conta.

Como achei que era uma perda enorme tão bom diagrama não chegar à comunidade não informática portuguesa, onde o diagrama seria claramente mais útil resolvi traduzi-lo para Português. Aqui  vai o link para uma imagem com resolução aceitável para imprimirem e colocarem na parede do vosso escritório.


Posted in Tech | Leave a comment

Spring’s arrived

Holland Park - Kensington - 04

For the past two weeks the weather in London has been simply amazing. Last weekend was the culmination of it all. Went for a leisurely stroll around Kensington and behold the scenery I contemplated at Holland Park! And to improve it even further the clear blue sky was even clearer than usual due to this flight ban.

So many times I’ve heard the joke “We said send cash not ash” and I must confess it used to be pretty funny! However now that I’m actually crossing my fingers to be able to fly at the end of the week it doesn’t seem so nice anymore. Makes one wonder how much for granted we take simple things. Having just seen both Zeitgeist movies it also makes me think about conspiracy theory – I mean the sky is clear blue, not the faintest sign of grayish ash in the air. :)

Well, I’m pretty sure the airlines are doing what they can to get flights up and running again, and it is obviously better not to fly than to have some “flight incidents”. Fingers crossed and maybe I’ll find out tomorrow that this new ash cloud is being blown away towards the US or maybe Greenland.

Ah, almost forgot, out of irony one of the few flights taking off from London today, guess where it went?? That’s it, Reykjavik… maybe the UK just sent a few technicians to cover up the hole and be done with this ordeal.

Posted in Sightseeing | Leave a comment

SimplePhoto version 1.0.0

simplephoto-150x150Finally, after almost a year battling away with wxWidgets, GraphicsMagick, gcc, Visual Studio, CppUnitLite, … I finally get to release version 1.0.0 of SimplePhoto.

SimplePhoto is a batch processing application for images. For the time being it allows image format, image dimensions and groupings.

My main focus for this application was to make it’s memory footprint as little as possible. It’s been implemented in C++ and on Windows it takes around 5MB of memory when running. There are still loads of features to implement but I really wanted to get this out there ASAP. Next step is to open source it (probably hosting it at

Have fun!

Posted in Programming, Software, Tech | Leave a comment