Page 1 of 2 [ 31 posts ]  Go to page 1, 2  Next

kalantir
Veteran
Veteran

User avatar

Joined: 25 Dec 2008
Gender: Male
Posts: 712
Location: Redmond, WA USA

14 Jan 2009, 8:30 pm

Ok so Ive got a class called Player and I was originally going to make an array to contain all of them. I want to leave room for the possibility of up to 30,000 players at a time(I know its not very likely, but I want to have my game set up to be able to handle it just in case). So I figured that a giant array might not be the most efficient way to do things. For example, if Bob does "/whisper John sup man" I would end up having to do a linear search through the array to find John and take it from there. Now, that could potentially take 30,000 iterations through the loop to find John. That hardly seems efficient. So I started thinking, and came up with an idea that would make it an average of 1000 times faster(maximum of 15 iterations through the loop vs the average of 15,000 it would be if there were 30000 players logged on). I later came to find out this method is actually fairly well known and is called binary searching. Ok so thats all fine and dandy but, in order for binary searching to work, it assumes that the players are all in alphabetical order. To maintain the alphabetical order, I would need to constantly be moving players around in the array and such which I figured could potentially also be an inefficient way of doing things. So then I got this idea of splitting my players into 26 different arrays. One for each letter of the alphabet. That way, instead of having to rearrange a giant array every time anyone logs on or off, I could just rearrange a smaller array containing a fraction of the players. Great! I thought... I then realized that I would have to declare 26 arrays that could each contain up to 30,000 of my Player class(Again I realize its highly unlikely that 30,000 players with the same first lettered name would be on, but you can't be too carefull). So I searched for a way to declare a dynamic array of classes. This led to my discovery of vectors. Ive yet to use vectors, but they sound pretty neat. Or at least they did untill I read this.

Quote:
For operations that involve inserting or removing elements at positions other than the end, they perform worse than deques and lists, and have less consistent iterators and references than lists.


and this
Quote:
Reallocations may be a costly operation in terms of performance, since they generally involve the entire storage space used by the vector to be copied to a new location. Therefore, whenever large increases in size are planned for a vector, it is recommended to explicitly indicate a capacity for the vector using member function vector::reserve


rearranging them in alphabetical order will involve me inserting and removing elements at positions other than the end almost constantly. as for setting the reserve... I dont see how thats much different then doing Player Players[30000] 26 times...

So I have 2 questions.
1. Is there a way to declare a dynamic array of classes?
2. Is there a more efficient way to handle this?


_________________
2101729 Kalantir-Bar-Orc-Mal-Cha escaped the dungeon


t0
Veteran
Veteran

User avatar

Joined: 23 Mar 2008
Age: 50
Gender: Male
Posts: 726
Location: The 4 Corners of the 4th Dimension

14 Jan 2009, 9:29 pm

What language are you using?

Efficiency is going to depend on how often you insert or delete in your structure. We use an AVL tree for things like this. See: http://en.wikipedia.org/wiki/AVL_tree

In general, searching on a string is going to be slower than searching on a number. This is because you can perform a comparison on an N-bit system in a single instruction (where N is the number of bits the hardware supports). Comparisons on strings require loops that check the length of the string and then the data itself, usually a single character at a time. You'll be better off if you can limit the number of lookups based on strings and cache information using pointers.

You haven't specified the implementation, but you may also be better off storing the players in groups based on proximity or other criteria. If player A can't interact with player B due to distance or something like that, they should probably be in different groups.



kalantir
Veteran
Veteran

User avatar

Joined: 25 Dec 2008
Gender: Male
Posts: 712
Location: Redmond, WA USA

14 Jan 2009, 10:02 pm

Im using C++
Unless theres something really fundamental I dont understand... keeping them in seperate groups based on location and such wouldnt really work. The reason being this. When I recv() data from the client, the only information I have about the player is their file descriptor, and whatever information is sent in the packet. How am I going to know what group to look for the player in going only by that? Of course, the packet could include information about where the player is and I could use that. But I try to have the server tell the client where the player is rather then the other way around to avoid having people use packet sniffing to cheat. as for numbers vs strings... Id love to use numbers, but like in the above example...

Quote:
if Bob does "/whisper John sup man"

the only info I get on the target player is their name, so i dont have much choice in what to use when searching for them.

I looked at the link... its interesting cuz I was just, and I mean JUST reading about binary search trees. Ill take it into consideration, but alternatively someone on a different website suggested I use hash_set or hash_map which apparently has a cost of operation of O(1) compared to the AVL tree which has O(log n). If you know what either of those mean, or how they compare, Id really like to know.


_________________
2101729 Kalantir-Bar-Orc-Mal-Cha escaped the dungeon


t0
Veteran
Veteran

User avatar

Joined: 23 Mar 2008
Age: 50
Gender: Male
Posts: 726
Location: The 4 Corners of the 4th Dimension

14 Jan 2009, 11:29 pm

Ok, so you're writing a chat server or MUD or something similar.

In your example, hash table or AVL will both do fine. The O() notation is the standard by which efficiency of algorithms is represented. I don't believe hash tables are always O(1) - depends on the size of the table and how well its hash function distributes the nodes across the array.

One suggestion I would make is to keep a second index based on the socket. The socket is a number (fast for searching) so that it is easy to find the player when they send you input. If you're using Windows, look up IoCompletionPorts. They're much more efficient in Windows than using Berkley's select().

Another suggestion - if you're looking at 30k players, you'll probably want to listen() on multiple ip addresses. The port value is a 2-byte short which is only 64k connections on a single ip address (each incoming TCP connection uses a new port). If you have 30k active players, you'll probably have some additional users pending in a lobby or something similar. Plus it would be really easy for someone to write a client to peg your server at 64k connections blocking other users from getting in.



kalantir
Veteran
Veteran

User avatar

Joined: 25 Dec 2008
Gender: Male
Posts: 712
Location: Redmond, WA USA

15 Jan 2009, 12:19 am

thanks for the tips. I guess Ill explain a little bit about what Im doing. Im not doing a chat server or a MUD. I already did those and now I'm moving on to bigger things. More specifically an MMORPG. Im doing it more for fun and practice then anything. Im not expecting to make the next WoW or anything(All though wouldnt that be nice if I did :P). Thats the main reason Im trying to find the most efficient way to do things. With the MUD I did, I simply did a linear search to find the player in the player array. Which worked fine, MUD's are pretty simple and small programs by comparison to MMO's. I realize that for an MMORPG I need to focus as much on efficiency as possible so I'm trying to look at all possible solutions for various different things like this. Analyzing how they work, then trying to improve upon those methods(if possible).
Also, Im on linux. No way im making a server on windows...lol


_________________
2101729 Kalantir-Bar-Orc-Mal-Cha escaped the dungeon


wolphin
Velociraptor
Velociraptor

User avatar

Joined: 15 Aug 2007
Age: 36
Gender: Male
Posts: 465

15 Jan 2009, 4:32 am

to put it kindly, repeat after me:

Don't write your own data structure, unless you have to.

How do you know if you have to? Trust me, you would know. If you don't know if you need to, you probably don't need to.

If you're looking to learn how to build a data structure, build it separate from your game. Don't cause unnecessary pain for yourself by biting off more than you can chew at once.

Now to answer your question. The idea behind std::map is binary search, exactly as you described you want to do, and so to look up a single player out of N would take approximately log(N) time. log(30000) ~ 15 accesses, which is still reasonable.

std::map avoids the problem of the alphabetic ordering by using a red-black tree, which in many ways is similar to the AVL tree discussed above, or another similar structure called a heap. These are so-called "self-balancing" structures, meaning that they preserve the order of the entries by rearranging elements as necessary when you do an insertion. Because of the way the data is organized, in a tree structure instead of a linear array, however, the cost of insertion is still approximately on the order of log(N), meaning an insertion is not meaningfully more expensive than a lookup.

std::hash_map is even better, giving typical constant time performance (meaning the cost of a single operation is not dependent on the number of entries in the structure). Correct, it is not guaranteed performance (in the unlucky case that many player names hash to the same value) but it almost always gives good performance overall.

The major disadvantage of the hash_map (besides the worst-case performance, which you'd have to be very unlucky to hit) is that the elements are not stored in ordered order. It's up to you to decide if it's worth the additional penalty to keep them sorted, in which case you are better off with std::map

My recommendation: use std::hash_map<std::string, Player> as your structure. You can pass player names in and get their player class out very efficiently, and add new players efficiently too.



Death_of_Pathos
Deinonychus
Deinonychus

User avatar

Joined: 7 Nov 2008
Age: 37
Gender: Male
Posts: 351

15 Jan 2009, 6:16 am

EDIT - wow Im a 'tard. I missed your post about A) this not being a MUD and B) not being on Windows, thus rendering my original post (below the dashes) useless.

I would like to address your belief that MUDs are small, simple things as I believe it is fundamentally flawed.

While it is typically a true statement, it is false as a blanket statement. Most MUD's are hobbyists' projects that are simply adaptations of earlier MUD's source, which are in turn not all that well designed to begin with (diku). However counterexamples exist. MUD, the first game of this genre, was designed by Richard Bartle in 1979 and contained within it things which modern games lack, like complex command line parsing and honest-to-goodness event handling. MUDs have, as a whole, moved backwards from the very beginning.

For example, consider what I am doing in my MUD project (sorry for linking to an image, couldn't figure out how to get gmail, the only place I have a copy on this computer, to export the text in the right format). Even if you implemented half of the concepts I have written there, the MUD would not be simple or small.

In the following paragraphs I am not trying to mean, or rude. I am trying to assist you by giving you a reality check. These are lessons I had to learn the HARD way.

You are in way way over your head. And your desire to move to a graphical MMO instead of a text MMO, citing the increase in complexity, is disconcerting, as it belies your ignorance and overconfidence.

You say that you have already made a MUD. Do you mind sharing the source? It would give me much insight into the extent of your capabilities, and where you need improving the most. In example: A linear search through a static array of logged users is very very bad coding practice. Logged players should not be in a static array (perhaps a linked list), you should differentiate between players based off of a unique numerical key, and the search method should be improved (use a binary search or indexing).

You have to learn good programming techniques if you ever want to progress as a coder. Before you move onto an MMO, try rewriting your MUD codebase. Trust me, there is a lot of improvement that can be made - you just have to look for it.

Game programming is hard. It is difficult to get a barely playable game, and getting a game that is actually worth playing is mind-boggling. Game programming is what masochistic computer programmers do to themselves. I fully expect to spend YEARS on my project, and find it unlikely that some of the concepts will ever be properly implemented.

You need to do everything you can to make this easier on yourself. If you do not have realistic goals you will fall flat on your face and disenfranchise yourself. (I should know, it happened to me)

I suggest you follow the following steps:

1) Read some programming books. (I have a stack of such books almost as tall as I am, and every single one either explicitly describes a binary search or considers the concept implicitly elementary)

2) Successfully implement some significantly smaller projects in adherence with good programming techniques. (not sure what good programming techniques are? ask / look it up)

3) Give the first design phase (conceptualization and planning) it's due as it is the most important phase of any programming project.

4) Be patient. I'm talking zen-level patience here. Implementing a telnet talker alone can be a frustrating experience if you don't know what you are doing. Hell, it took me a full 2 hours to connect to my first database, and I had a book detailing step by step instructions (with pictures!). It kept failing without error message. Can you guess the reason? Turns out it was an outdated graphics driver that began flooding the event log as soon as SQL was loaded, preventing the C# IDE from using it for its own purposes. Like I said, zen-like patience.

5) Get really comfortable using Google. Coding is 50% Google, 50% luck. (see previous anecdote)

6) Don't be afraid to look outside your horizons. You said there was no way you were going to have a windows server. Why not? If the function-laden GUI of the C# IDE helps you get your brain around your project, then isn't it worth a try? That said I don't know Linux, but I would not reject using it just because I am in love with some other OS.

7) Don't be afraid to ask for help. You will need it, we all do.

8) Figure things out for yourself. Save pestering other people for when you are REALLY stuck. Nothing gets people to stop helping you faster then failing to do a Google search before you ask for help.

If you have more questions or feel like you need direction, feel free to ask. Please remember that I posted this in good faith and with good intentions.

-----------------------------------------------------------------------------------------------------
Original post -----------------------------------------------------------------------------------------------------

I am working towards a similar sort of project for myself (a multiplayer text based game project focusing on the implementation of interesting concepts). I have made two decision that have made my life easier, and will probably help you.

First - use C#. I know that some people hate it (and some of them hate for good reasons), but it works for me and has redeeming qualities that make me glad I chose to use it.

Second - Use C#'s built in DBMS (database management systems) to make your life easier. Not only is it way more optimized then anything either of us could code from scratch, it is actually pretty easy to use.

That said, it is just my opinion and I am no coding guru.

If you want my input on your project, or text games in general, its a hobby of mine so I will be glad to oblige. Also, if you want to hear the ideas I hope to implement on my own project, I'd love the opportunity to talk through it again.



kalantir
Veteran
Veteran

User avatar

Joined: 25 Dec 2008
Gender: Male
Posts: 712
Location: Redmond, WA USA

15 Jan 2009, 9:22 am

kalantir wrote:
Im doing it more for fun and practice then anything. Im not expecting to make the next WoW or anything(All though wouldnt that be nice if I did :P).


Honestly, Id love to share the codebase I made. I was very proud of it. However, I lost it along with all other projects I had finished/was working on recently due to an incident where my roommates at the time kicked me out and sold everything I owned. :( However, something interesting I did with it, was turn it into a graphical MORPG(notice the missing M for Massive). I didnt however, write my own client. I used a generic mmorpg client that was created by a guy I used to MUD with who goes by the name of skinhat. If you're interested, it should still be on his website. http://www.skinhat.com/

As for the criticism. I understand that you mean well, but seriously... bugger off. I specifically stated that it was more so for the learning experience and fun then anything else. Im not expecting that Ill ever even fully finish with this project. For example, I have to make a client and I dont know anything about 3d modeling. In addition to that there are a number of other things I will be doing which Ive never even attempted before. I just happen to find this kind of thing fun. Also, if you took the time to read my original post, youd realize that using google isnt an issue for me, as I mentioned research I did before posting. Rather then giving me the same lines of discouraging crap that you've no doubt gotten used to hearing yourself, you could try actually being helpful, as discouraging people from working on their projects which they are already in the middle of is counter productive for both you and the other person. lean from t0's example

Quote:
First - use C#

Yes, Ill just scrap all the work ive done so far and completely rewrite everything in a programming language im not familiar with. That makes perfect sense...

Quote:
3) Give the first design phase (conceptualization and planning) it's due as it is the most important phase of any programming project.

I spent plenty of time on this. Dont you worry about that... I used to have a file with all my ideas and plans in it, but I lost that with my MUD and everything else. Luckily, most of that remains in my head.

In any case though... this topic is now pointless as I have the answers I need now.


_________________
2101729 Kalantir-Bar-Orc-Mal-Cha escaped the dungeon


Last edited by kalantir on 15 Jan 2009, 7:39 pm, edited 4 times in total.

t0
Veteran
Veteran

User avatar

Joined: 23 Mar 2008
Age: 50
Gender: Male
Posts: 726
Location: The 4 Corners of the 4th Dimension

15 Jan 2009, 10:36 am

I don't think you have the right answer yet.

If you're writing a MMO (marketing name for MUD) with a graphical client, it changes quite a bit what you want to do. Take your example:

"/whisper John sup man"

The client software should convert this to <command id> <target id> <text> and then send it to the server. The server should check the packet to make sure the values are valid and then perform the command. This takes the load off the server and puts it on the client.

So that means the client needs a list of the logged in players and a corresponding numeric id to make things easier for the server. The server needs to keep the client updated as to the current player list.

I would step away from developing your efficient algorithm for 30k players and instead just build a class that is the player list and performs whatever search method you're familiar with. As long as your "PlayerList" class interface stays the same, the internal functions can change and you can optimize them over time. If you have a library that's going to do this for you, fine - use it, but don't develop something new. You're not at the point where you have 30k players.

I spent a lot of time running a CircleMud on Ultrix during the early 90s and later worked on a MMO at Microsoft (on NT). The principles are the same no matter what the architecture. I've since started a new mud in C#. I have enough written to walk around in the world and interact with items. Currently bogged down with writing UI for building the world but some day maybe I'll get there.



kalantir
Veteran
Veteran

User avatar

Joined: 25 Dec 2008
Gender: Male
Posts: 712
Location: Redmond, WA USA

15 Jan 2009, 6:01 pm

My god man, didnt circlemud annoy the hell out of you? They seemed to think that endless if/else and switch statements were the answer to everything... The only time I really liked circlemud was when I was making a DnD 2.0 MUD as they're already heavily based on 2.0. As for sticking to what I'm familiar with as you mentioned above, if I do that, then this takes away that part of the learning process which is the main reason I'm doing this.

Quote:
Currently bogged down with writing UI for building the world but some day maybe I'll get there

I decided that rather then spending a lot of time building the world, I'd psuedo-randomly generate the terrain based on a text file Id make.. for example. Ill have a text based world map thatd look something like...

Code:
~~~~II~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~+AR++~~~~~~~~~~~~~~~~~~~~ Legend
~~~~~~~~~~~~~~~+++++^^^^DVlRElllllllll~~~~~~~ ------
~~~~~~~~~~~~~~~^^^^^^^^^^lllllllllllll~~~~~~~  AR - Arcadia
~~~~~~~~~~+++++^^^MV^^^++lOGlll@C+^^SV~~~~~~~  DV - Deall Village
~~~~~~~~~~+++++^^++++++~~~~~~~~~~++++~~~~~~~~  EN - Engelwood
~~~~~~~~~~++^^^++++++~~~~~~~~~~~~~ll~~~~~~~~~  GA - Galwind
~~~~~~~~~~+^^^ENME%NE%~~~~^^^~~~~~lSE~~~~~~~~  GV - Geldar Village
~~~~~~~%++^^^^%%~~~%%%%~~~~~~~~~~~~~~~~~~~~~~  ME - Merenth
~~~~~~~~+++^^+++~~~~%%%%~~~~~~~~~~~~~~~~~~~~~  MC - Mitzelplic's Castle
~~~~~~~~+++^MC++~~~~~%%%%~~~~~~~~~~~~~~~~~~~~  MT - Morgar's Tower
~~~~~~~~~++^^^+++~~~~~%%%SM~~~~~~~~~~~~~~~~~~  MV - Mountain Village
~~~~~~~~~+++^^^++~~~~~~~~~~~~~~~~~~~~~~~~~~~~  NE - Necropolis
~~~~~~~~~~~~%llGVl~~~~~~~~~~~~~~~~~~~~~~~~~~~  OG - Oghmania
~~~~~~~~~~~~~%lllll~~~~~~~~~~~~~~~~~~~~~~~~~~  RE - Rebma
~~~~~~~~~~~~~~%lllll~~~~~~~~~~~&&&&GA~~~~~~~~  SC - Sanom City
~~~~~~~~~++~~~~~~~~~~~~~~~~~~~~~&^^&&&&&~~~~~  SV - Seaside Village
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%&&&~~~~~~~  SE - Sengoku
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  SM - Smyrna
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 + - forest  l - field  % - jungle  & - marsh  ^ - mountains
     & - desert  ~ - water  ^ - hills   % - beach  % - ice plains

[code tags added by lau... I thought you wouldn't mind]
[Edit: thanks lau. I normally dont think to do that unless its actual code. didnt think itd make a diff for this]

This is not my creation and it looks a little messed up compared to how it actually looks on screen in the mud, but you get the idea I hope.
And based on this, I would psuedorandomly generate the terrain. So, I would have some algorithm for generating mountain terrain, marsh terrain, jungle terrain, etc... randomly place trees and other such stuff. Then anything specific Id want to add would simply overwrite what got generated for that location. I decided on this approach because Im a one man team and cant be bothered to fuss with creating all that stuff too much, and I want a very large world to explore in.


_________________
2101729 Kalantir-Bar-Orc-Mal-Cha escaped the dungeon


Last edited by kalantir on 16 Jan 2009, 2:21 am, edited 2 times in total.

wolphin
Velociraptor
Velociraptor

User avatar

Joined: 15 Aug 2007
Age: 36
Gender: Male
Posts: 465

15 Jan 2009, 8:20 pm

t0 wrote:
The client software should convert this to <command id> <target id> <text> and then send it to the server. The server should check the packet to make sure the values are valid and then perform the command. This takes the load off the server and puts it on the client.

So that means the client needs a list of the logged in players and a corresponding numeric id to make things easier for the server. The server needs to keep the client updated as to the current player list.


Why on earth would you want to do something like that? Supposing you allocate 30 bytes per username and 2 bytes, synchronizing a 30,000 long list of player names and ID's with each client (almost a megabyte of data) sounds like a much larger strain on the server than a hashmap. And if each client only keeps track of the logged on clients, that's even worse, because each log-on requires the new data to be pushed to every logged on client.

Having the client forward the whole username to the server, and using the username as an index in a hashmap is not that big a performance impact on the server at all. Even larger systems with 100,000s or millions of users will often do a similar thing with an SQL backend.

Even more important, it's the simplest approach, and simpler is often better. If he spends the effort designing a sophisticated communications protocol, he's spending less time making his game fun, and won't have a problem with too many users. If he spends the time instead making the game, and it ends up that 30,000 users is too slow, he can either rework the algorithm at that point, or else just buy faster hardware.



t0
Veteran
Veteran

User avatar

Joined: 23 Mar 2008
Age: 50
Gender: Male
Posts: 726
Location: The 4 Corners of the 4th Dimension

15 Jan 2009, 8:37 pm

kalantir wrote:
My god man, didnt circlemud annoy the hell out of you? They seemed to think that endless if/else and switch statements were the answer to everything...


It was ok. I learned a lot looking at the algorithms and the way the initial Diku authors wrote things vs how Jeremy wrote them in Circle. Jeremy was a cool guy back then (probably still is) - we actually met one summer while I was still on the east coast and we talked for several hours about the code, muds, etc.

Quote:
I decided that rather then spending a lot of time building the world, I'd psuedo-randomly generate the terrain based on a text file Id make..


I wrote one of those for Diku and used it heavily when making maps back then. I chose to use Microsoft's XML format for my world files (3 lines of code to load & save - what could be easier) and they're much more complex that the old Diku world files.

wolphin wrote:
Why on earth would you want to do something like that? Supposing you allocate 30 bytes per username and 2 bytes, synchronizing a 30,000 long list of player names and ID's with each client (almost a megabyte of data) sounds like a much larger strain on the server than a hashmap.


Yeah, I was thinking at first that the server wouldn't need to keep the name data - but I see your point. I don't think 1MB is that much stress on the server - it's just (cheap) memory. I think it's still worthwhile to reduce the latency of each packet by eliminating the name lookup. If it's good enough for DNS, I think it's good enough for a game server.



kalantir
Veteran
Veteran

User avatar

Joined: 25 Dec 2008
Gender: Male
Posts: 712
Location: Redmond, WA USA

30 Jan 2009, 6:09 pm

Thought Id post here again if nothing else, for the sake of following up. Anyways, I did end up getting a hash table set up. Its really quite simple to set up and use. I don't see why so many people throughout various gamedev forums think its something only experts should attempt. So here's my results

Code:
Hash Table: Found - 30000/30000
Average - 0.333333 microseconds
Total - 0.01 secs
Low/High Hashes - 2/32767
Collisions - 10257
# of collisions per table entry -
0. 12138 - 37.0422%
1. 5481 - 16.7267%
2. 1686 - 5.14526%
3. 354 - 1.08032%
4. 79 - 0.241089%
5. 4 - 0.012207%
6. 1 - 0.00305176%
Empty. 13025 - 39.7491%


I'm using the RS hash algorithm. I tried various other ones(ap, sdbm, elf, etc...) and this one just works the best(at least with my test set of strings). In this test I took 30,000 different strings, assigned them to objects of type WordClass which was nothing more then
Code:
class WordClass
{
public:
    char Word[30];
    class WordClass *Next;
}

I then created my table, started the timer, added all 30k objects, retrieved them all, and stopped the timer. The table was of size 32768(2^15). As you can see, 12138/30000 objects were added to the table with no collisions. There were 5481 table entries which recorded 1 collision(meaning 2 objects sharing the same hash value). And the number of entries with more collisions goes down and down untill it gets to 6. There was only 1 entry in the entire table with 6 collisions. This means that when Im trying to find a player, in the worst case scenario, it will only ever take 7 iterations at most. but usually 1 or 2. Also keep in mind that only 1 player out of the 6 in that entry will actually take the full 7 iterations to find. That is going to be the player on the end of the linked list for that entry. Compared to a binary search for example, these results are incredible. A binary search would take 15(sometimes less) iterations to find the object(out of 30k anyways, because 30k is after 2^14 and before 2^15). In addition to this, a binary search would require me to have all my objects in order, which would cause me to spend even more time sorting my player array. If I wanted to increase the efficiency of the table further(less then 6k collisions!), I could double the table size and (very)slightly modify the hash algorithm to give values in a wider range. However, I've come to the conclusions that the slight increase in efficiency isn't really worth the excess memory required for the larger table.
Also note that the percentages were all calculated out of the table size(2^15). Also, the real version of this hash table(the one I put in my game vs this test one) should technically run slightly faster because I'm not keeping track of collisions and such, because that data is pretty much worthless to me and only slows it down.

Heres the rest of the output from the test program in case you might be interested
Code:
Linear Search: Found - 30000/30000
Average - 106.667 microseconds
Total - 3.2 secs

File Read: Found - 30000/30000
Average - 6.33333 microseconds
Total - 0.19 secs

Total elapsed time = 3.4 secs

File Read is some experiment I was doing with looking up players based on reading an int from a file. WAY better then a linear search as far as speed, but still no contest with hash table.


_________________
2101729 Kalantir-Bar-Orc-Mal-Cha escaped the dungeon


lau
Veteran
Veteran

User avatar

Joined: 17 Jun 2006
Age: 75
Gender: Male
Posts: 9,619
Location: Somerset UK

30 Jan 2009, 7:28 pm

Some words of caution.

===========

Where are you getting the memory for collided entries?

In fact, I suspect, form your class definition, you are allocating that from the heap, hence your table size is, in reality, a great deal larger than you imply.

There are good techniques for allowing collisions to be stored within the table itself. This also avoids all need for a "next" pointer.

============

Another point... just to be sure that you are testing with meaningful, real-life data.

If you generated a random set of 30,000 strings, your choice of hash algorithm will be fairly irrelevant. As that is what you say you measured, I guess that is what you did.

In real life, the distribution of your "usernames" will be far from random. Maybe the hash algorithms you've used are not sensitive to this, but I've certainly seen cases where a bad choice of hash has resulted in a disaster.

E.g. Just as a silly example, if the hash took the first three ASCII characters and joined their five ls bits one to the other as the 15 bit number. On the face of it, a "cheap" way to do things, but the letter distribution will cause the hash to "clump" badly.

===========

Anyway - "hash table" = "good", but you need to be very careful exactly what is happening. "Here there be dragons".


_________________
"Striking up conversations with strangers is an autistic person's version of extreme sports." Kamran Nazeer


kalantir
Veteran
Veteran

User avatar

Joined: 25 Dec 2008
Gender: Male
Posts: 712
Location: Redmond, WA USA

30 Jan 2009, 7:43 pm

wow. thanks for the reply. I'll definitely look into alternate methods of collision handling. As far as my strings go, they weren't randomly generated. I pulled them from a Dictionary.lst file I found. I was thinking of making my hash table so that if it starts getting too many collisions, it switches hashing algorithms and resorts the table. Based on this test I did, I don't expect that would cause any speed problems, or if it does, it'd just be a tiny bit of lag for a sec.

EDIT: the reason I originally chose the linked list approach was that, if I stick it into the next table entry instead, that could potentially cause a collision for another object that would ordinarily take that spot
While we're on the topic of hash tables. Would it be worth my while to do a perfect hash for my input parser? Or should I just stick with a binary search, which is what I'm already using? I haven't yet fully read up on generating perfect hash algorithms for a given set of strings, but it doesn't seem nearly as easy as what I've already done.


_________________
2101729 Kalantir-Bar-Orc-Mal-Cha escaped the dungeon


Last edited by kalantir on 30 Jan 2009, 8:01 pm, edited 1 time in total.

lau
Veteran
Veteran

User avatar

Joined: 17 Jun 2006
Age: 75
Gender: Male
Posts: 9,619
Location: Somerset UK

30 Jan 2009, 7:58 pm

If you can get hold of them, the Donald Knuth books have a bunch of stuff about handling collisions. You are quite right to spot that overflowing into the sequential entry is another disaster. Following a single pseudo-random pattern is exactly as bad. I don't recall the details, but the methods that work were a little devious, IIRC.

Of course, if you do put collisions inside the table, it can actually fill up, and getting anywhere near full is another pitfall.

=============

And... another trick that may be relevant... presumably you do not expect your whole membership to be actively online simultaneously, all the time?

Making your table only contain those actually active online - i.e. just a cache.


_________________
"Striking up conversations with strangers is an autistic person's version of extreme sports." Kamran Nazeer