tobilehman.com: a blog on computing, structure and math

Analyzing Bash History and Fixing Typos

At the command line, I frequently type things too fast, and typos abound. A single character can mean the difference between showing documentation and deleting files (rm vs ri), so autocorrect is definitely a bad idea in this context.

Instead of a generic autocorrect, a better idea is to find the most common mistakes. To do so, I used frequency analysis like in this post to narrow down what I use most at the shell:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ cat ~/.bash_history |
     awk -F\| '{print $1}' |
     sort |
     uniq -c |
     sort -n |
     tail -15

 157 rake routes
 221 dbtt
 232 git fetch -p
 300 rails c
 370 gi ts
 376 g gs
 403 git add .
 405 rails s
 406 git b
 433 exit
 435 git lg
 663 git diff
1112 ls
1898 clear
4486 git s

Notice that gi ts is extremely common, I meant to type git s all those 376 times. As a solution, I could just alias it, but I would prefer a more general solution that would handle gi tdiff and gi tb as git diff and git b respectively.

I made the following script called ~/bin/gi:

1
2
3
4
5
6
7
8
9
#!/bin/sh

if   [[ $1 =~ 'ts' ]]; then
  git s
elif [[ $1 =~ 'tb' ]]; then
  git b
elif [[ $1 =~ 'tdiff' ]]; then
  git diff
fi

So that gi ts is no longer a mistake, it means what I meant it to mean. This saves me a few keystrokes, and it is a good example of why scripts in your path are generally better than aliases, since you can have logic in them.

How Many Possible Flags Are There?

I have been thinking about Mars a lot more lately, and about possible colonization. The Mars One project is a non-governmental not-for-profit organization that is looking to send groups of four people, independent of nationality, to Mars in 2023.

One thing that came to mind was independence, just as the early North American settlers declared independence from Great Britain, I think that Martian settlers would eventually declare independence from the countries of Earth, provided they had a sustainable, self-reliant colony.

As a side effect, the Martian settlers would probably choose a new flag, and then the math geek in me wondered how far this could go, how many different flags are possible? As humanity grows, evolves and expands, assuming that each nation that emerged had a flag, how many unique flags could possibly be created?

If we allow for any arbitrary size and aspect ratio, the number is infinite. However, most flags have the same aspect ratio, and their implementation as cloth is usually in fixed sizes.

Note that flags are physically made of thread, we make the simplifying assumption that all flags are made of the same width thread, and that the thread is evenly spaced.

Flags have some terminology, so a few definitions are in order:

  • Hoist is the width of the flag (vertical direction)
  • Fly is the length of the flag (horizontal direction)
  • Vexillology is the “scientific study of the history, symbolism and usage of flags [1]

We will call H the number of threads in the vertical direction, and F the number of threads in the horizontal direction.

Assuming threads are evenly spaced, we can imagine the H*F crossing points on a grid, as in the image below:

Each crossing point is either above or below, so there are 2 distinct choices for each of the H*F crossing points, that means that there are 2HF possible flags, ignoring color.

If we now consider the role of color, imagine that each of the H+F threads could have any of C distinct colors, then there would be C(H+F) possible color combinations.

Since the under/over configuration of the points is independent from the color choices, it follows from the combinatorial principle of products that there are:

\[ 2^{HF}C^{(H+F)} \]

possible flags. This is the general solution, now let’s find some real-world data and get some more constraints so we can compute some numbers. (Everything following this formula is just finding the values of H and F, so if you don’t care about the research, simplifying assumptions and data-wrangling, you can skip to the end)

Typically there are fixed aspect ratios, and some correlation exists between the height of the flagpole and the hoist/fly.

Height of the flagpole versus the fly and hoist

Using the United States’ Deparment of Interior specifications as a model, we can use the following data to get an approximate relation between the height of a ground flag and the hoist/fly of the flag:

Ground Flagpoles [2]

1
2
3
4
5
height (ft)  hoist (ft)    fly (ft)   aspect ratio (hoist/fly)
30           3.5           6.65       1.9
40           5.0           9.5        1.9
50           5.0           9.5        1.9
60           8.95          17         1.89

Since the aspect ratio is approximately constant (as we would expect), the problem of finding the relation between height, hoist and fly reduces to a one-dimensional linear regression. We now try to find fly as a function of height, which is in the y direction:

\[ f(y) = a + by \]

Using the least squares method, the values of a and b are found exactly, the above formula becomes:

\[ f(y) = 0.3105y + (-3.31) \]

So given a height y, the fly of the flag should be about (0.31)y - 3.31(ft).

Aspect ratios

To find the aspect ratios of the current flags of Earth, I found this on wikipedia. I went to the edit view and then copied the wiki source. On Mac OS X, the pbpaste command writes the contents of the clipboard to standard out on the command line. On GNU/Linux under Xorg, you can use xclip -o to achieve the same thing.

So I played around with the data and came up with this one-liner:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
> pbpaste | pre nts | awk -F\| '{print $3}' | sed 's/[\}]//g' | pcregrep '^\d' | sort -n | uniq -c
   1 0.820
   2 1
   1 1.154
   1 1.167
   1 1.25
   1 1.321
   5 1.333
   3 1.375
   1 1.389
   2 1.4
   2 1.429
   1 1.467
 114 1.5
   1 1.571
   5 1.6
   1 1.618
   1 1.636
  22 1.667
   2 1.75
   1 1.772
   1 1.864
   4 1.9
  83 2
   1 2.545

Most countries use 1.5, 2 and 1.667. As fractions, these are 3/2, 2/1, 5/3, respectively. Also, one country (Togo in Africa) uses 1.618 ≈ φ, the Golden Ratio!

Since the overwhelming majority of flags use the 1.5 and 2 ratios, let us assume for this problem that these are the only ratios that will be used. Since the United States flag uses the 1.9 ratio, we can approximate it as 2. Just for reference, Russia and China use 1.5 and U.S.A. uses 1.9, and the U.K. uses 2.

Colonizers on other planets will initially be close to the ground and spread out. Since residential flags typically range between 15 and 20 feet, we will be safe and assume that the inital flag is 15 feet tall. From our formula, this means that the Fly will be (.3)(15ft) - (3.31ft) = 1.19 ft.

Number of threads

To find the values of H and F, we need to know the width and spacing of the thread, a common size of polyester thread for making flags is Size 69, which has a diameter of 0.2921 mm. So, assuming that the threads are all adjacent, the number of threads in the Fly direction will be (1.19ft)/(0.2921 mm) ≈ 1241.

The number of threads in the Hoist direction (assuming a ratio of 1.5) is 1241*(1.5) ≈ 1861

Number of Colors Distinguishable by the Human Eye

This number is about 10,000,000 [4]

The number of distinct, 15 foot, 3/2 flags made of size 69 polyester thread is

\[ 2^{1861\times1241}(10,000,000)^{1861+1241} \approx 1.19 \times 10^{716943} \]

This is a 716,944 digit number, the number of possible flags is so much higher than the number of atoms in the observable Universe that it isn’t even plausible to assume that all of them could ever be exhausted.

White House Open Data Policy

Just yesterday, President Obama signed an executive order that requires government agencies to publish their data in “open, machine readable formats”:

the default state of new and modernized Government information resources shall be open and machine readable.

I have a hard time imagining better uses of the President’s dictator-like power than this.

Personally, I don’t believe the President (or any individual for that matter) should have the ability to make Laws without first submitting them to a review process and subsequently a vote. Executive Orders are problematic because they bypass Congress, it is a flaw in an otherwise reasonably balanced system:

However, the consequences of this particular executive order are in our favor, so this is a good thing, despite the fact that it came about because of a bad mechanism. Forcing the Bureaus to open up their data for public consumption enables individuals and groups outside the government to do things with that data that most of the bureaucrats could never have imagined.

All of this is great, provided the data are accurate, it is entirely possible that data could be ‘fudged’, ‘massaged’ or just plain made up. So in addition to the newly hackable government data, there should be a more active skepticism about the accuracy of that data. For example, if the Department of Homeland Security is reporting that there are cyber attacks coming from China, that data should be cross-checked with that of ISPs to ensure that there is a legitimate threat before any laws are passed or executive orders signed.

I think this is a good thing that came about for the wrong reasons, but the consequences are more important than the intentions, because the consequences really happen, intentions are just in the mind.

Fixed Point in Ruby Hash Function

A fixed point of a function \( f:S \to S \) is an element \(x \in S\) such that

\[ f(x) = x \]

That is, \(f\) is a no-op on \(x\). Some examples:

(Check out that link above, fmota wrote about how they discovered a fixed point in the base64 encoding function, it’s very interesting)

Ruby’s Fixnum class has an instance method called hash. It is the hash function used by the Hash class to locate the value.

One thing to note that is interesting,

1
42.class == 42.hash.class # true

The integer literal 42 is an instance of Ruby’s Fixnum class, which is exactly the type that is returned by Fixnum#hash. So, if we let N be the set of all Fixnum values, and h be the hash function, then the function

\[h: N \to N \]

Does h have a fixed point? Let’s find out, the generic way to find a fixed point is to apply the function over and over and see if any of the iterates are the same:

\[ x, f(x), f(f(x)), f(f(f(x))), f(f(f(f(x)))), … \]

In Ruby, we could start with a value n and loop until the next step is the same as the current step:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def find_fixed_point(n)
  m = n.hash

  while n != m
    puts n

    n = m
    m = m.hash
  end

  puts n
  puts m
end

find_fixed_point(42)

This code terminates in 62 steps, here is the output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
42
1818615832163790001
97302458964831319
3241638738618469355
-1538644867632915805
4556542729113842835
-707745146237515789
4042604241838953267
-3938749251519753037
-3262109345615183437
2726245977638182835
2363300705344768947
-1077013243652537421
3673817879955862451
4480325791167763379
-3402798086540651597
4108231692027892659
742946247983240115
3380480562708485043
-3611524319884209229
2461606551736423347
2556374051055866803
-853528980180560973
301437974151041971
-684460774007630925
2785951334519935923
1234765569947210675
3485015807817552819
-2988541774381313101
-2969442663896050765
3743208565546292147
-2143850698816220237
985968426639299507
-2191943438346873933
465213455999570867
-1249312491853966413
-1963857645314632781
3582438201892410291
146054934450017203
-2298892513473850445
-813726632499604557
-1775501339152477261
-4287223502620716109
-2436529928794664013
-3361799749893745741
487423333182608307
4144170308747006899
1852752892089734067
1009031649399542707
-1504821367603326029
-1663010304514714701
1979275894121173939
657469403487933363
-3805597827236228173
-608042091803176013
3625341557925090227
-4337022583265946701
4381946295323333555
-3544389048848739405
-4409080177303874637
3084909602640630707
1931988098033783731
-373854911179910221
4237831107247477683
4237831107247477683

So the integer 4237831107247477683 is a fixed point of Fixnum#hash, that means that in the implementation of Hash, the value 4237831107247477683 would have itself as a key.

There are more examples (play with the code yourself!), and I would like to look deeper into why this hash function has a fixed point.

Visualization of SICP Exercise 1.14

I am currently working my way the Structure and Interpretation of Computer Programs and I’ve skipped past exercise 1.14, and come back to it after a bit of thinking, here’s the problem, and then the exercise.

The Problem

How many ways are there to make change of a given amount a with the following kinds of coins?

  • pennies
  • nickels
  • dimes
  • quarters
  • half-dollars

There is a recursive solution to this combinatorial problem, which can readily be made into executable code in Scheme, this kind of solution is very standard in enumerative combinatorics:

The number of ways to change amount a using n kinds of coins equals:

  • the number of ways to change amount a using all but the first kind of coin, plus
  • the number of ways to change amount a - d using all n kinds of coins, where d is the denomination of the first kind of coin

Note that those two items are mutually exclusive and exhaustive conditions, so the result can be calculated by simply adding the two values.

In scheme, the above list could be transliterated as:

1
2
(+ (cc a (- n 1))
   (cc (- a d) n))

Where (cc a n) computes the number of ways of changing amount a with n kinds of coins.

The full code for the count-change procedure can be found here.

The Exercise

With the count-change procedure at hand, Exercise 1.14 is to “draw the tree illustrating the process generated by the count-change procedure in making change for 11 cents.”

The Solution

The count-change procedure uses the (cc a n) procedure where n = 5, and the cc procedure naturally gives rise to a binary tree that locally looks like this:

I prefer to make the computer go through all the steps and produce an image for me, so I took a break on 1.14 and thought about it for a while.

To represent the tree, I used the graph-description language DOT

To generate the tree, I started by adding a print statement around the recursion steps, the problem with that is that there can be distinct nodes that happen to have the same argument values, that is, the node in the tree may be labeled (cc a n), but there may also be multiple nodes with the same a and n values. To avoid this, each node must be given a unique id, and then be displayed with the (cc a n) label.

One way to label a binary tree’s nodes is to make the id be a map of the location of the node in the tree. For example, if a node of the tree has id x, then the root’s children will be xl and xr, respectively, where l stands for ‘left’ and r stands for ‘right’.

If the root’s id is s, then a typical node would be labeled something like sllrrl. Starting at the root, you can find the node by going left two times, right two times, and then left.

Here is the full source of the tree-generating code cc-graph:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
(define (cc-graph amount kinds-of-coins)

  (define display-node (lambda (label amount kinds-of-coins)
                         (begin
                           (display "  ")
                           (display label)
                           (display " [label=\"")
                           (display `(cc ,amount ,kinds-of-coins))
                           (display "\"];")
                           (newline))))

  (define display-edge (lambda (a b)
                         (begin
                           (display "  ")
                           (display a)
                           (display " -> ")
                           (display b)
                           (display ";")
                           (newline))))

  (define base-case (lambda (amount kinds-of-coins)
                      (or (< amount 0)
                          (= kinds-of-coins 0)
                          (= amount 0))))

  (define left (lambda (label)
                  (string-append label "l")))

  (define right (lambda (label)
                  (string-append label "r")))


  ; label is the unique label of the function invocation
  (define (recurse label amount kinds-of-coins)
    (if (base-case amount kinds-of-coins)
        (display-node label amount kinds-of-coins)
        (begin
          (display-node label amount kinds-of-coins)
          (display-edge label (left label))
          (display-edge label (right label))
          (recurse (left label) amount (- kinds-of-coins 1))
          (recurse (right label)
                   (- amount (first-denomination kinds-of-coins))
                   kinds-of-coins))))

  (begin
    (display "digraph {")
    (newline)
    (recurse "s" amount kinds-of-coins)
    (newline)
    (display "}")))

Finally, the output of running (cc-graph 11 5), then piping the results into GraphViz gives the desired tree:

I love this way of visualizing recursion, you can see how the problem is reduced into simpler sub-problems, and that there is a distinct ‘shape’ to the computation.

There are more than 100 edges in that tree, I would not have wanted to do that by hand, all for a measley value of four.

The final value of (cc 11 5) is 4, that is, there are 4 ways of making change for 11 cents. Unfortunately, this solution doesn’t say what exact combinations of coins, only that there are four.

Just thinking about it, you can make 11 cents with

  • 11 pennies
  • 6 pennies, 1 nickel
  • 1 penny, 2 nickels
  • 1 penny, 1 dime

I would like to generalize cc-graph so that I can get a visualization of any recursive function in Scheme, this will take more knowledge of the language and it’s introspective features, stay tuned!

Lies Damned Lies and Statistics

There is a quote, usually attributed to Mark Twain that goes something like:

“There are three kinds lies. Lies, Damned Lies, and Statistics.”

My interpretation of this is that statistics are supposed to be the worst kind of lie, or that the worst kinds of lies use statistics.

The thing that bothers me most about this quote (and the innumerable minor variations of it that get repeated) is that the word ‘statistics’ comes at the end.

Why does that matter? Notice that the list is presented as a sequence, an increasing sequence of damned-ness, and the presence of the word ‘statistics’ at the end is supposed to imply that it is ‘more damned’ than damned lies.

This interpretation bothers me because the implied damned-ness is based on the initial correlation, and that correlation is only based on two data points. The quote depends on a misunderstanding of statistics. Anyone who has studied a little bit of statistics will know not to trust an inference based on a correlation in a data set of only two points!

Hooking Jenkins Up to a Computer-controlled Light Switch

About a week ago I wrote about how to hook up a light switch to a raspberry pi. Having a computer-controlled light switch is nice, but the novelty wears off pretty quickly. The next question that arises usually is how can I make this useful?

At work, our continuous integration server, which runs Jenkins, lets us know when one of the team members has broken the build. To make sure that we get the memo promptly so we can commence with the public shaming, we use tools that change color to indicate the current test status.

The problem with our current way of doing things is that there is no sound, and it requires that someone be at their computer. To remedy this situation, we wired up physical lights to a raspberry pi running a Debian GNU/Linux variant, and wrote this script to toggle the lights.

  • On means Passing (all jobs passed)
  • Off means Failing (at least one job failed)

The data flows from Jenkins to the Raspberry-Pi as follows:

gpio stands for (General Purpose Input/Output), the utility is part of the wiringPi package

NOTE: You need to set the following environment variables:

  • JENKINS_USERNAME

  • JENKINS_PASSWORD

  • JENKINS_HOSTNAME

Dependencies:

1
2
3
4
5
curl --user $JENKINS_USERNAME:$JENKINS_PASSWORD $JENKINS_HOSTNAME/api/json 2> /dev/null                      |   # (Jenkins Job Statuses)

ruby -rjson -e 'puts (JSON.parse(STDIN.read)["jobs"].map{|j| j["color"]}.all?{|c| c=="blue"} ? "up":"down")' |   # (JSON parser/extractor) 

while read line; do   gpio write 4 $line;     done                                                               # (gpio utility)

This script runs on the raspberry pi itself.

Unmarshalling a List of JSON Objects of Different Type in Go

This post started with mattyw’s blog post Using go to unmarshal json lists with multiple types

To summarize the article, we are given a JSON string of the form:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
    "things": [
        {
            "name": "Alice",
            "age": 37
        },
        {
            "city": "Ipoh",
            "country": "Malaysia"
        },
        {
            "name": "Bob",
            "age": 36
        },
        {
            "city": "Northampton",
            "country": "England"
        }
    ]
}

And our goal is to unmarshal it into a Go data structure. The article goes into more detail, and two solutions were proposed. A commenter came up with a third solution, and another commenter dustin proposed using his library called jsonpointer, which operates on the raw byte array of the json string, instead of unmarshalling first and then traversing the data structure.

I used Dustin’s library, and to great avail, the only gotcha was that json strings were returned with the double quotes in them and some trailing spaces, but I made a little function that returned a slice of the original bytes:

1
2
3
func trimJsonBytes(toTrim []byte) []byte {
    // implementation found in solution.go
}

Here is the algorithm:

1
2
3
4
5
6
7
8
9
10
11
12
13
loop i:=0,1,2,...
  if  /things/i/name  is empty
    make a person
    append it to the person array

  else if /things/i/city is empty
    make a place
    append it to the places array

  else
    end of array, break out of loop

end loop

The full source code can be found here

Make a Computer-controlled Light Switch With a Raspberry Pi

To build a computer-controlled light switch, you will need:

The powerswitch tail looks like an extension cord with some holes in it to wire it into your own circuit. Connect the powerswitch to the raspberry pi as in the image below (on is connected to pin 23):

Then, the following python program will allow you to type ./switch on or ./switch off from the command line as root.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/usr/bin/env python
# Turn switch on/off
# by tlehman

import RPi.GPIO as io
import sys

io.setmode(io.BCM)
pin = 23
io.setwarnings(False)
io.setup(pin, io.OUT)                 # set pin 23 as output

if len(sys.argv) > 1:
    state = sys.argv[1]               # get command line argument ('on' or 'off')
    io.output(pin, state == 'on')     # if state is on, let pin 23 connect the circuit, otherwise break the circuit
else:
    print("Usage: ")
    print("  ./switch (on|off)")

To run this, carefully plug in a lamp (or other appliance that uses a standard 120V U.S. outlet), then

1
2
3
sudo su
./switch on
./switch off

Here is a video of the light being switched off and then back on, not very exciting, but it works:

This on it’s own is not very useful or amusing, but this can easily be tied together with any API or command line utility. For example, I plan to connect this to our continuous integration server at work so that every time the tests fail, the switch turns some lights off, this could be achieved with a cron job, or perhaps a hook on Jenkins that sends a signal to the raspberry pi, there are so many possbilities.

Swap Values in C Without Intermediate Variable

Using the following properties of the XOR function:

  • Associativity
\[(a \oplus b) \oplus c = a \oplus (b \oplus c) \]
  • Commutativity
\[a \oplus b = b \oplus a \]
  • Identity
\[a \oplus 0 = a \]
  • Self-Inverse
\[a \oplus a = 0 \]

As a bit of trivia, note that all n-bit integers form an Abelian Group under XOR. The proof of which can be found by using the obvious isomorphism of n-bit integers with {0,1}n under addition modulo 2. Note that addition modulo 2 is equivalent to bitwise XOR.

So, using the C programming language, we can use the convenient ^= operator as a way to swap the values of a and b without using an intermediate variable.

1
2
3
  a ^= b;      // (a ^ b)
  b ^= a;      // b ^ (a ^ b)   which is the original a
  a ^= b;      // (a ^ b) ^ b   which is the original b

Here is a full working program that implements this operation using a C macro:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <stdio.h>

#define show(a,b)    printf("a = %d, b = %d\n", a, b);

#define swap(a,b) \
  a^=b;  \
  b^=a;  \
  a^=b;

int main(int argc, char *argv[]) {
  int a = 3, b = 5;
  show(a,b);

  swap(a,b);

  show(a,b);
  return 0;
}