tahnan | Aug. 4th, 2005

Sophomore year, I took a computer science class--CS 15, Introduction to Program Design, I think it was called--and decided that (a) it was fun and (b) I didn't want to spend my life doing that.

In the last few days, when I needed a break from my dissertation, I've been idly working on a programming challenge from ITA Software, who'd advertised them on the T. I decided that most of them looked boring, but the one marked "(hard!)" was interesting, so I gave it a shot.

A few days later, I've cracked it. Not especially well; my program still produces vaguely English-like gibberish. (For instance: SILANISULU SIE LATA TT NAI CED TRE THE NENENG WIPEDO TTOXCH F RE DOF CO A THIN AST CRY ORBICUE HACHLINA I IN WOMAN COUGE? STHON' NOLIM IT DEA? UTWERE ST DO HACH AUS OLENENDA HINCILF MISEN ITOLERRY and so on. Actually, it's getting worse; it looked to look more like English: SNEW ORREN UN-IMAT;---COON CER TRE THE NINETED FID I SUE, HE IN. IFE CS. CRINE: OF CUSSO. FINE RUD'S MAN AND WOMAN CRI FEESER, ABOX ING ANGEL?--LODES STH JON CH ANA'AMSIDS FIENDLYGON DIGHONKIEST MEET PIASA. That's from the version with less differentiation of frequency in the dictionary files, one less training text, and a somewhat broken backtrack mechanism.) Anyway, not cracked well, but enough to get a string of 30 letters which looked like part of a sentence--seven words in the first message, six words in the second--and Googling the snippets I had was indeed enough to turn up the source of each.

I've employed a lot of little tricks along the way: training on texts, bigram and trigram frequency, dictionary matching, wordlength frequency, backtracking.

So the thing is, I still want to be a linguist. But sometimes, I miss the programming.

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

The Thinks I Think

...and another think coming

Aug. 4th, 2005

Career choices

Profile

April 2026

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags