Aug. 4th, 2005

tahnan: It's pretty much me, really. (Default)
Sophomore year, I took a computer science class--CS 15, Introduction to Program Design, I think it was called--and decided that (a) it was fun and (b) I didn't want to spend my life doing that.

In the last few days, when I needed a break from my dissertation, I've been idly working on a programming challenge from ITA Software, who'd advertised them on the T. I decided that most of them looked boring, but the one marked "(hard!)" was interesting, so I gave it a shot.

A few days later, I've cracked it. Not especially well; my program still produces vaguely English-like gibberish. (For instance: SILANISULU SIE LATA TT NAI CED TRE THE NENENG WIPEDO TTOXCH F RE DOF CO A THIN AST CRY ORBICUE HACHLINA I IN WOMAN COUGE? STHON' NOLIM IT DEA? UTWERE ST DO HACH AUS OLENENDA HINCILF MISEN ITOLERRY and so on. Actually, it's getting worse; it looked to look more like English: SNEW ORREN UN-IMAT;---COON CER TRE THE NINETED FID I SUE, HE IN. IFE CS. CRINE: OF CUSSO. FINE RUD'S MAN AND WOMAN CRI FEESER, ABOX ING ANGEL?--LODES STH JON CH ANA'AMSIDS FIENDLYGON DIGHONKIEST MEET PIASA. That's from the version with less differentiation of frequency in the dictionary files, one less training text, and a somewhat broken backtrack mechanism.) Anyway, not cracked well, but enough to get a string of 30 letters which looked like part of a sentence--seven words in the first message, six words in the second--and Googling the snippets I had was indeed enough to turn up the source of each.

I've employed a lot of little tricks along the way: training on texts, bigram and trigram frequency, dictionary matching, wordlength frequency, backtracking.

So the thing is, I still want to be a linguist. But sometimes, I miss the programming.

Profile

tahnan: It's pretty much me, really. (Default)
Tahnan

April 2026

S M T W T F S
   123 4
56 7 8 91011
1213 1415161718
19202122232425
2627282930  

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags