Okay, my own input on "Amazonfail" here.
As a cataloger myself, I can tell you that it is possible for a technical screw-up to cause the observed result. Without knowing more about Amazon's database I could not tell you how, precisely. (Even with knowing some more, it would probably be obscure to me as I doubt they are running Unicorn, so what I am about to describe is a hypothetical situation based on my own experiences.) On our own it would be something like "misconfigure a report that globally does something to subject heading indexes", which is easy to do if you are not scrupulously careful because there are lots of fiddly little options on many reports even in the GUI; those who venture to home-cook API scripts without first consulting Customer Care, beware!
I don't know if this was necessarily subject-category related (it could be some kind of invisible internal tagging Amazon applies, for instance), but if it was, since many books have multiple subject headings it is plausible to me that something some programmer did somewhere affected only certain titles that had the right "lucky" one or two, even while they happened to share subjects three and four or whatever with other titles, so the effect was that only part of the pool was masked.
NB I am not saying there is no problem here, merely that I think remarks like "nuh-uh, it was awfully selective to be just a mistake!" are underinformed about what is possible to occur with massive book databases.
As a cataloger myself, I can tell you that it is possible for a technical screw-up to cause the observed result. Without knowing more about Amazon's database I could not tell you how, precisely. (Even with knowing some more, it would probably be obscure to me as I doubt they are running Unicorn, so what I am about to describe is a hypothetical situation based on my own experiences.) On our own it would be something like "misconfigure a report that globally does something to subject heading indexes", which is easy to do if you are not scrupulously careful because there are lots of fiddly little options on many reports even in the GUI; those who venture to home-cook API scripts without first consulting Customer Care, beware!
I don't know if this was necessarily subject-category related (it could be some kind of invisible internal tagging Amazon applies, for instance), but if it was, since many books have multiple subject headings it is plausible to me that something some programmer did somewhere affected only certain titles that had the right "lucky" one or two, even while they happened to share subjects three and four or whatever with other titles, so the effect was that only part of the pool was masked.
NB I am not saying there is no problem here, merely that I think remarks like "nuh-uh, it was awfully selective to be just a mistake!" are underinformed about what is possible to occur with massive book databases.
no subject
Date: April 16th, 2009 02:34 am (UTC)From:I don't think any person or group "chose" that in the active sense. My speculation is that what did the "sifting" was entirely automatic: the code which writes search-result pages saw these items had no rank, or corrupt rank, or something like that, and so dropped them from results lists. (I seem to recall people saying that you could still locate things by searching by title.)
I don't mean there is a value judgement going on ("books with no ranking are not results anyone wants"), rather something more like error-checking: if part of the record does not match the robot's expected formatting, it may be instructed to drop the whole record as a bad job, rather than insert bad data which might do anything from appear poorly formatted onscreen to causing the program to crash.
Obviously I have not deconstructed Amazon's back end, but my programmer husband tells me this is a plausible scenario, if not verifiable in this case.
As to how that rank data problem occurred in the first place (what you are terming "sifting some books out of the sales-ranking factor", i.e. removing them from a possible pool), that's a separate question. I think the explanation that the books got excluded did so by accident is plausible, such as with the scenario I described from cataloging experience where something semi-automated that alters field B based on what's in fields A and C is mis-configured and yields results you did not intend. It is of course also possible it was the evil intent people ascribed at first (akin to "let's make sure these kinds of books don't get any rank"), or personal malice of a person who can alter this sort of data (an inside job vs. an external hack, but not something sanctioned by Amazon).
I was not trying to offer a complete solution to how this obviously was all by accident, simply contesting the single notion of "it was too selective for that to be possible". The apparent selectivity could be a side effect of the specific internal structure of the relational database. Poking part of it in a certain way could produce this result without someone having specifically intended it.