Saturday, July 03, 2004
Now *that* would be cool
Nowadays, if you have an affiliation to a good university, you can get almost any journal you want online, in full text (and text searchable). It's like having a world class library in your house, office, or anywhere you take your laptop. Except. There are precious few books online, Amazon's look inside the book notwithstanding.
But imagine you scanned the entire Harvard library into a database, and made in text searchable. Your university could "subscribe" to Harvard Library, and give you everything in the Widener stacks right to your desktop in Omaha, Nome, or Ulan Bataar.
And that's just the start. Now suppose you know some perl, and you can run text extraction code on the Harvard library. As a whole, or in any subset of books you choose. Suddenly, a fusion of programming skills and quantitative history could yield more social science than you could shake a very big stick at.
That's not mad social science, though. It's the future, and we'll see it, surely within 20 years; maybe much sooner.
Follow up: It would also be good to set up the data in the natural web structure provided by bibliographies. E.g., it should be very easy to get from a book to its sources, or to create source-citation maps. A sort of Science Citation Index for the whole card catalog, finding hyper-text links where we thought we only had bound sheets of dead tree matter.
But imagine you scanned the entire Harvard library into a database, and made in text searchable. Your university could "subscribe" to Harvard Library, and give you everything in the Widener stacks right to your desktop in Omaha, Nome, or Ulan Bataar.
And that's just the start. Now suppose you know some perl, and you can run text extraction code on the Harvard library. As a whole, or in any subset of books you choose. Suddenly, a fusion of programming skills and quantitative history could yield more social science than you could shake a very big stick at.
That's not mad social science, though. It's the future, and we'll see it, surely within 20 years; maybe much sooner.
Follow up: It would also be good to set up the data in the natural web structure provided by bibliographies. E.g., it should be very easy to get from a book to its sources, or to create source-citation maps. A sort of Science Citation Index for the whole card catalog, finding hyper-text links where we thought we only had bound sheets of dead tree matter.