I've been updating my coding skills over the last week or so, and have taken on a toy project to help myself explore Groovy scripting further. However, this question is not really limited to groovy, but a general algorithm.design pattern question.Say I have an ordered hash map of items, each with a weighted value: ( Nothing - 50, Pair - 25, Two Pair - 10, Trips - 5).

How would you code a method that returns a random item from that list so the frequency distribution of each item would equal to the weighted value. Ideally the method must work for any ordered map. Or even any map, as I'm not sure how strong the native map ordering is in Groovy.

I've got a solution that works, but it doesn't seem to be the most efficient way of doing it.

I'm imaging there's a couple of programmers who've hit this before and sailed through it. I'm happy to share my code if it helps.

Well, twenty years ago, a programmer might have simply set up an array: nothing, nothing, nothing (x10), pair, pair (x5), two pair, two pair, trips, etc. That might still be a decent brute-force way to do it, but I'm sure there's a much more elegant algorithm available.

Do you know if video poker machines (I assume that's what you're experimenting with here) still use that same ancient C program that was running the 486 processors in 1993? I know that IGT was kind of reluctant to upgrade their programming as long as the old stuff worked, since a VP program isn't exactly taxing for the CPU.

The weighted numbers are much larger than what I've used here, so that would rapidly get ugly. However, the hash map is small.

No idea.. but I'd suspect a lot of the core code has been the same for years.. writing bullet proof code is hard, and testing it to ensure it's bullet proof is why they pay me. I'd avoid paying me if I was IGT and use something that was 'known good'.

Abstractly, you're talking about the same concept as a weighted slot machine reel. You want to 'spin' the array and display the chosen 'symbol'. The algorithm choice comes down to space vs. time. You can do it in constant-time if you pre-generate an array of size N = total weight, as mkl suggested, and then just do a random draw between 0 and N-1. But if you're looking for a more space-efficient algorithm then you can probably auto-generate a tree and do a log-n traverse somehow. How big is N, anyway?

Add up all the weights.

Pick a number between 1 and that total.

Subtract the first weight from your picked number.

- If you hit zero or below, that's your pick.

- If not, go to the next weight and repeat.

Example:

50, 25, 10, 5 are the weights for picks 1, 2, 3, 4.

Pick a number between 1 and 90. (example: 75)

Subtract weight of pick 1. (75 - 50 = 25)

Drop to zero? (No)

Subtract weight of pick 2 (25 - 25 = 0)

Drop to zero? (Yes)

Pick 2 is your weighted pick.

Example 2:

Pick a number between 1 and 90. (example: 88)

Subtract weight of pick 1 (88 - 50 = 38)

Subtract weight of pick 2 (38 - 25 = 13)

Subtract weight of pick 3 (13 - 10 = 3)

Subtract weight of pick 4 (3 - 5 = -2)

Drop to zero or below? (Yes)

Pick 4 is your weighted pick.

If you want to trade memory for speed, there are things you can do to make selection in constant time. For example, use the 'alias method', described here: http://prxq.wordpress.com/2006/04/17/the-alias-method/. It is really really elegant, runs in constant time, and requires linear memory.

X is one million.

Yeah, then your best bet is to use something smarter with memory than the single-array pick. You've got a few choices - both the algorithm Dween described and the tree-traversal I suggested are listed here:

http://eli.thegreenplace.net/2010/01/22/weighted-random-generation-in-python/

More on the Alias method, including a comment section describing how it works:

http://cg.scs.carleton.ca/~luc/rnbookindex.html

Chapter 3 probably has a solution for what you want. I haven't read it yet, but I'll probably take the PDFs to a printer and make myself a hard copy (the author says it's okay).

I'm going to take a look at the other algorithms as well.

I'm peeling off the whole thing into a seperate object, so the pay table, frequency and other items will be methods and variables I can call at any time.

