tahnan: It's pretty much me, really. (Default)
[personal profile] tahnan
Amazon.com has now included full-text searching of books as part of its search engine.

This is a bad idea for a couple of reasons: one, if I'm searching for, I don't know, books edited by Will Shortz, I don't want something of David Sedaris's showing up on the list just because he mentions Shortz somewhere around page 200.

And second, because searching for a random string like "tqn" turns up such classic works as:


  • James Patterson's Along Came a Spider: ". . ñnnff®rr- mati®nn. She had nvifnnsed tt® talllk tt® tthe Burrrreaun, tQn®ungh. She°d graded w8tth me. tUyauni IIs in t11ne Anndes M®annnttainns, . . ."

  • Tolkien's Book of Lost Tales: "". . . ^_ ~i~eVi,..a."^ltsw K'~Y~Rneaw• ¡ Cs..t ~.tae ~~ lá tí~•t, b,.+.1 c,tqN • CB..c &.K- tome .. ,SH tome saaw •,„,J + . . ."



...well, you get the idea.

(no subject)

Date: 2003-10-28 05:47 am (UTC)
From: [identity profile] fuldu.livejournal.com
I was noticing this problem as well, just yesterday. I was searching for books with each of a finite set of words in the title (for a puzzle, so I'm omitting the actual info). Fortunately, the words are reasonably common, so I was able to gauge whether I'd gone too far by whether Under the Tuscan Sun had yet appeared on the list. I'm going back to doing book searches at Powell's.

(no subject)

Date: 2003-10-28 06:07 am (UTC)
From: [identity profile] mattbeo.livejournal.com
Perhaps they took scans of the pages (or just the digital galley proofs or whatever they're called now) and let some OCR software run amok on them without editing afterward. On the one hand, it seems that they should just request a full electronic copy of each text from the publisher and use that in their search database instead of this gobbledygook. On the other hand, if they did that, they might as well publish the stuff free because it would not be difficult to reconstruct and pirate the texts.

(no subject)

Date: 2003-10-28 07:31 am (UTC)
From: [identity profile] mildmannered.livejournal.com
gobbldygook bug aside, it's an interesting feature if only for the possibility of seeing where certain phrases come from/breed. You know, like tracking the use of coalition of the willing or pervy hobbit-fancyer or multitudinous seas incarnadine in books published over a several years' period.

Profile

tahnan: It's pretty much me, really. (Default)
Tahnan

January 2026

S M T W T F S
    123
4567 8910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags