Sometimes you just have to take matters into your own hands.
After posting about the difficulty of finding a modern source for choosing a random city or a random river, I went ahead and created:
I took the data from Maxmind’s free world cities database, but there are odd gaps in it. Although the database lists about 3.3 million cities, the population field is blank for all but about 50,000. (Most notably, that field is blank for all but two of the thousands of cities in the Republic of Korea.) My server offers up a random city from among those 50,000. I’m sure I can probably find a list of Korean cities with populations and manually tack them on to the list at some point. [Edited to add: I’ve just done that.]
Now I want to do something similar for river lengths. Does anyone know where to get the data? (No, pawing through Wikipedia’s hundreds of separate lists, for each part of the world and each letter of the alphabet, is not an option.)
This reminds me I once did something like this for baseball players. I call it the Evil Player program. It randomly orders all the players in the current season and prints the name of player 666 at the top.
http://www.baseballmusings.com/cgi-bin/EvilPlayer.py
I used it one year to pick a player at random and explain why he was evil.
Regarding your previous post, I don’t know about rivers, but when I need a data set to play around with, I go to Wolfram Alpha, ask a related question, and look at the Sources tab at the bottom to see which data sets Wolfram used. That’s often a good start.
In fact, just typing in “ten random cities” will get you a list of cities and some stats, but not the individual city populations. Maybe there’s a way to get it at Wolfram Alpha, but if you have Mathematica, you can pull down the data set, and a query like this will get you 10 random cities and populations:
Scan[Print[CityData[#, “FullName”], ” : “,
CityData[#, “Population”]] &, RandomChoice[CityData[All], 10]]
It probably wouldn’t too much more work to get the stats that you need to demonstrate Benford’s law.
Windypundit:
This, I think, is exactly the idea I was looking for. Thanks.
Clicking a few times led to some thoughts.
I came across one very soon with population of about 250. Then I realised that the majority of the cities (villages?) will be tiny ones I have bever heard of. How long before I come across one I know? It too 15 clicks until I came across Guadaloupe. But this one was in the Philipines, and not one I was aware of. At 23 I came across Cartagena. Yay – but alas this one was also in the Philipines with a population of 3000. At 35 was Buenas Aires. At last. But this one was in Honduras with poulation 979. On about 79 (I lost count I admit) I came across Idaho Falls, which I am sure I have heard of, although I can’t think why I should know of this snmall town in Idaho (Pop. 52000). Looking it up, it was one of the first nuclear power accidents. The very next one was Marmaris in turkey. Actually been there. Five later was Derby in the UK. Nothing for 80 tries, then three come at once. The wonders of randomness.
I wonder how random the old method of pointing a finger at a book really was. I bet people try to avoid missing the edges of the book, so are actually selecting for the central members of the list. True randomness is very difficult to achieve, but for most cases a psuedo-randomness will suffice.
Harold’s reply (4) leads to an interesting (if you’re sufficiently geeky) game. How many clicks do you need to find someplace you’ve heard of and someplace you’ve been to? For me, it was 15 (Karpinsk, Russia) and 19 (Streator, Illinois). Alternatively, out of 100 clicks, how many of each for you? For me, there were three I’ve been to and four others I’ve heard of (two of them quite large, Darwin and Kirkuk).