2010-05-15

My understanding of MapReduce

I've recently read up on MapReduce and this is my understanding

MapReduce is a framework for distributing parallel computation over large dataset on a computer cluster.
It takes care of the low-level tasks like splitting & scheduling jobs, disk I/O, bandwidth management, error detection and recovery.
It is suitable for simple computation on large dataset, where the computation on one part of the data does not affect the computation on another part (linearity), thus trivially parallelizable.
One way to understand it is that "Map" abstracts the transformation loops, while "Reduce" abstracts the aggregation loops.

There is a loose analogy with SQL, though the dataset here is not normalized (no JOIN). Also, the result set is not necessarily the subset of the (aggregation of) input (as in SQL). And "Map" is actually more generic than the SQL analogy counterpart. Some non-relational databases expose MapReduce interface for querying data.

2010-03-07

Learning New Languages - What's So Hard About It?

Most advices for learning new languages are about "a lot of practice". Uh, sounds reasonable! Then what? Many people still have difficulty learning languages. Why is that? "Practice" is rather a vague term. Though I'm no expert, I'd like to share my thought on the topic as someone who has learnt 2 foreign languages quite well.

First is an article I ran into yesterday on wikipedia - Sapir-Whorf hypothesis
These are some excerpts from the article:
In 1820 Wilhelm von Humboldt connected the study of language to the national romanticist program by proposing the view that language is the very fabric of thought, that is that thoughts are produced as a kind of inner dialog using the same grammar as the thinker's native language.
"No two languages are ever sufficiently similar to be considered as representing the same social reality. The worlds in which different societies live are distinct worlds, not merely the same world with different labels attached." - Edward Sapir
... Whorf was not principally concerned with translatability, but rather with how the habitual use of language influences habitual behavior. Whorf's point was that while English speakers may be able to understand how a Hopi speaker thinks, they are not actually able to think in that way.
Eric Lenneberg's formulation of the hypothesis:
1. Structural differences between language systems will, in general, be paralleled by nonlinguistic cognitive differences, of an unspecified sort, in the native speakers of the language.
2. The structure of anyone's native language strongly influences or fully determines the worldview he will acquire as he learns the language.

There is a stress on "native language". It seems to me this is because it's assumed that the person's proficiency in his non-native (second, third, ...) languages is often (much) lower than in his native. Furthermore there is a speculation that it's impossible for speakers of one language to think the way of another language's speakers. We know the best language learners are children, who don't have a fixed way of thinking or a fixed worldview (yet). I would say this is the root cause of adults' difficulty in learning a new language.

However, I disagree with the assertion of impossibility. I myself has learnt 2 languages rather successfully not by practicing a lot, but by forcing myself to think in those languages, using the "kind of inner dialog using the same grammar" mentioned by Humboldt. I think this is very efficient, especially when the new language is immensely different from the native one (in my case the highly synthetic Russian versus the highly analytic Vietnamese).

So, what's the point of this? It is - a powerful way to learn new languages is to relax the existing worldview to think, really think in the new language. It is the highest form of language immersion. I suspect that it was the way Mezzofanti "fluently spoke thirty-eight languages and forty dialects, despite never traveling outside of Italy."  And let me stress this: translation to one's native language is extremely harmful for learning new languages.

How about the "language superiority", and are computer languages related to this (say, Turing-complete and Universalist theory of language)? That would be another post.

2010-02-28

Emacs Usability Problem

I recently put a rant on Twitter about how Emacs could be improved in the hand of Apple, not FSF. There was a response from a guy I don't know saying Apple would ruin Emacs with their "mousey" tendency. No this has nothing to do with mouse.

Emacs is a good software and like it a lot. I even have a lot of customization in .emacs and .emacs.d. Many of them are nice, but I think half of them shouldn't be there in the first place (e.g. heavy keybindings and usability customization, frequently used extensions). In this regard I think Emacs suffer from the same problem most Unix distributions face - highly customizable but barely usable (without customization). Users must spend time on and on customizing the thing just to use it. This explains why they failed, are failing and will continue to fail at the desktop market. Apple did a really good job here with Mac OS X.
Some people would argue that customization is fun. Yeah it is, and actually many geeks like it, just like some people like masturbation. It's not bad, but rather pointless (Stuart Halloway made this analogy with Java's bloated frameworks in this video about concurrent programming in Clojure).

I know giving Emacs to Apple might not be a good idea due to all this technology "politics". The point here is that FSF has something to learn from Apple about making usable stuffs.

2010-02-09

"com.apple.boot.plist not found" error on multi-boot Hackintosh

I have a VAIO VGN-CS115J triple-booting Windows, Mac OS 10.5 and Ubuntu.

After reformatting my Windows partition as HFS+ (getting rid of Windows, yeah), I had the "com.apple.boot.plist not found" error. It seemed Disk Utility gave the newly created (reformatted) partition the "bonus" of being active (being boot partition).

The error message was actually given by Chameleon, who could not find the boot.plist file in the active partition (of course). So I simply pressed F8, selected the Ubuntu partition, booted into it, fired up GParted, flagged the OS X partition as"boot", restarted.

Multi-boot capability is a savior :) Without it I would have needed a live CD (which I currently don't have). And thumb down for Disk Utility!

A side note: After I moved a large amount of data to the new partition, ATSServer worked like crazy. I guess it's Spotlight indexing PDF files being moved (lots of them :|).

2010-02-07

Firefox problem with Cappuccino

Today I played with Cappuccino and encountered a quirk with Firefox
  • If the app is accessed via http://, it works, but the debug console shows that Firefox attempts to fetch Frameworks/AppKit/AppKit.j and Frameworks/Foundation/Foundation.j, and of course gets 404.
  • If the app is accessed via file://, it stops at the loading screen with the spinning indicator. The debug console shows Firefox complains about not finding some AppKit/Foundation classes in search paths.
UPDATED: Other browsers on Mac work normally (Safari, Opera, Chrome, OmniWeb) except for Camino and Flock, so I guess this has something to do with the Gecko engine.

2010-01-30

Setting up SLIME and Clojure on Aquamacs

UPDATED 2011-04-02: This is outdated, see http://dev.clojure.org/display/doc/Getting+Started instead.

UPDATED 2010-03-06: This works with Emacs 23 (Mac/Ubuntu) and latest versions of slime, swank-clojure and paredit. Ubuntu users can use apt-get/wget instead of port/curl.

There are already some instructions for setting up SLIME and Clojure. Most suggest using ELPA (Emacs Lisp Package Archive). This is nice given that ELPA is becoming the standard for managing Emacs packages. However, as of now I prefer not to use ELPA for several reasons:
  • ELPA SLIME is not up-to-date and includes only slime and slime-repl (no fancy features e.g. fuzzy completion).
  • ELPA paredit is not up-to-date.
  • The REPL set up by ELPA is not as friendly to Clojure as it is to Common Lisp.
So after using ELPA for a while I decided to set things up "manually". Here are the steps I took.

Gather the pieces

  • Get SLIME (I used MacPorts version 20100113_0)
sudo port -v install slime
  • Get paredit (version 21)
curl -O http://mumble.net/~campbell/emacs/paredit.el
  • Get clojure & clojure-contrib
Either using MacPorts
sudo port -v install clojure clojure-contrib
Or downloading directly
curl -O http://build.clojure.org/snapshots/org/clojure/clojure/1.1.0-master-SNAPSHOT/clojure-1.1.0-master-20091202.150145-1.jar
curl -O http://build.clojure.org/snapshots/org/clojure/clojure-contrib/1.1.0-master-SNAPSHOT/clojure-contrib-1.1.0-master-20091212.205045-1.jar
  • Get clojure-mode and swank-clojure (Emacs side)
git clone http://github.com/technomancy/clojure-mode.git
git clone http://github.com/technomancy/swank-clojure.git
  • Get swank-clojure (Clojure side)
Either downloading pre-built jar file
curl -O http://repo.technomancy.us/swank-clojure-1.1.0.jar
Or building from source (assuming lein is installed)
cd path/to/dir/swank-clojure
lein jar
  • Put clojure, clojure-contrib and swank-clojure .jar files in ~/.swank-clojure or ~/.clojure (the default places where swank-clojure.el searches for them).

Configure Emacs

Add this to ~/.emacs
(add-to-list 'load-path "/opt/local/share/emacs/site-lisp/slime/")
(add-to-list 'load-path "/opt/local/share/emacs/site-lisp/slime/contrib/")
(add-to-list 'load-path "path/to/dir/clojure-mode/")
(add-to-list 'load-path "path/to/dir/swank-clojure/")
(add-to-list 'load-path "path/to/dir/paredit/")

;; Customize swank-clojure start-up to reflect possible classpath changes
;; M-x ielm `slime-lisp-implementations RET or see `swank-clojure.el' for more
(defadvice slime-read-interactive-args (before add-clojure)
(require 'assoc)
(aput 'slime-lisp-implementations 'clojure
(list (swank-clojure-cmd) :init 'swank-clojure-init)))

(require 'slime)
(require 'paredit)
(require 'clojure-mode)
(require 'swank-clojure)

(eval-after-load "slime"
  '(progn
     ;; "Extra" features (contrib)
     (slime-setup 
      '(slime-repl slime-banner slime-highlight-edits slime-fuzzy))
     (setq 
      ;; Use UTF-8 coding
      slime-net-coding-system 'utf-8-unix
      ;; Use fuzzy completion (M-Tab)
      slime-complete-symbol-function 'slime-fuzzy-complete-symbol)
     ;; Use parentheses editting mode paredit
     (defun paredit-mode-enable () (paredit-mode 1))
     (add-hook 'slime-mode-hook 'paredit-mode-enable)
     (add-hook 'slime-repl-mode-hook 'paredit-mode-enable)))

;; By default inputs and results have the same color
;; Customize result color to differentiate them
;; Look for `defface' in `slime-repl.el' if you want to further customize
(custom-set-faces
 '(slime-repl-result-face ((t (:foreground "LightGreen")))))

(eval-after-load "swank-clojure"
  '(progn
     ;; Make REPL more friendly to Clojure (ELPA does not include this?)
     ;; The function is defined in swank-clojure.el but not used?!?
    (add-hook 'slime-repl-mode-hook 
              'swank-clojure-slime-repl-modify-syntax t)
    ;; Add classpath for Incanter (just an example)
    ;; The preferred way to set classpath is to use swank-clojure-project
    (add-to-list 'swank-clojure-classpath
                 "path/to/incanter/modules/incanter-app/target/*")))