> Let's say that index.head[0] is null so i would have to set a head. > so i should just set index.head[0] to the first a WORD_T that comes in? Yes. > and i don't understand something about find_word i can see that it > could be used to delete the words... but the problem is since we are only > using singly linked lists then when it returns the word we won't be able > to know the previous node... so i can't set the previous nodes next to > the one after the deleted node..??? so should i delete in find_word or > what? You can use find_word for two purposes. First, you can call find_word to discover if the word is already present in the index before creating a new node and trying to insert it. If it's found, then you may want to update its section number array using the returned pointer. Second, you can use find_word to delete a node. If you find the word in the index and return a pointer to that index, you can use that address to find the previous node by traversing the list up to the node before the pointer in the exclude_words function. -------------------------------------------------------------------------- > I don't quite understand when we should be using malloc. > Do we hard code the array sizes for word and sections in > the struct? If not, how (or when) do we use malloc to do > that? The number of linked lists in the index is constant as well as the sizes of the word string and section array, so they can be hard-coded (if you're not doing the extra credit). You'll need to use malloc whenever you create a new node of WORD_T to insert into the index. For the extra credit portions, you need to use malloc to allocate space for the word string and the nodes in the section number linked list. -------------------------------------------------------------------------- > When I read in the words from the omit file using fgets, and storing in my > variable called omit, then I look at the length of that variable, it seems > to be reading in too many chars. > > ie. the first word in the omit list is "a", and as soon as I read in > that word and use strlen, it tells me the length is 2. And I know that > strlen doesn't even count the new line character. So what is it counting > and how do I get around that? strlen doesn't count the string terminator ('\0'), but it does count the newline ('\n') as a character. -------------------------------------------------------------------------- > Will the input file ever look like this? > > 3RTY
> In words, will the file try to trick the program to see if it is > only checking the first character in the line for a digit? In > the above case, this would not be a new section according to the > specifications, but in my program as it runs now it would. You > said there could be up to 30 sections, but I didn't know if we > had to check for that case since you didn't specify if there were > any input standards. You don't have to worry about the above case. If there's a blank line and the next line begins with a digit, it's a new section. -------------------------------------------------------------------------- > Also, on the paper it says "The head array in INDEX_T will contain > one entry for each letter in the alphabet." This is really confusing > because INDEX_T is a type of structure, not a structure. I don't see > why you would have a structure with one field anyway unless it is just > easier to see it being passed around to functions that way. Is there > something wrong with accessing the array directly with a pointer? There is no difference between the type name INDEX_T and the underlying abstract structure that it renames. If what you really mean is that INDEX_T is not the same as a variable that has been allocated space for the struct, then you need to think of it more abstractly. As for structs with one field, occassionally it's convenient to place a type within a struct to hide its complexity in formal and actual parameters. Additionally, it results in more readable code, and for these reasons, it was done here. -------------------------------------------------------------------------- > when we are "removing punctuation and other extraneous characters", should > we remove apostrophes from conjunctions and index the conjuction without > the apostrophe. There will be no conjunctions in the input file, so you don't need to worry about them. -------------------------------------------------------------------------- > what about hyphenated words? > > can we assume that every word in a line will be seperated by a space > character? No hyphenated words will appear in the input. And yes, every word will be delineated by white space. -------------------------------------------------------------------------- > Will there ever be a case where there are tricky indexes, like 0, negative > numbers, or numbers out of order? I started coding for those cases and > realized it wold probably be a waste of time since I don't think you're that > cruel. ;) No tricky section numbers (0, negative, or out of order); however, the section number may not start with 1. -------------------------------------------------------------------------- > also, so we ONLY allocate memory when we're adding new WORD_T's? and the > program will free them automatically? Yes and no. You only allocate memory when creating a new node to insert into the index. You must free the memory whenever you delete a node from the index; the system will only return memory when your program terminates. -------------------------------------------------------------------------- > Is there a function like "is_digit()" that can check a char > to see if it is a number, and if so could you give us the specs for it. You can easily check for a digit by testing whether the character is in the range '0' to '9'. -------------------------------------------------------------------------- > If you have time, can you put another test driver on the web with it's > solution? I encourage you to create another test file yourself and verify the results (it's quite easily done). -------------------------------------------------------------------------- > Whenever I try to get the section number from a line I > just read in it gives me a number like 49 for 1, 50 for 2, > and so on. > I am setting the section like this: > > section = (int) line[0]; > > Is there a special way we need to extract the number from > a line of chars? You're getting the ASCII value for the character instead of the int value. You should use sscanf. -------------------------------------------------------------------------- > I've heard that we need to free all the memory after > we've printed the index, but on the handout it doesn't > say anything about that. Could you clear this up. You only need to free memory when you delete a node (omit a word). Upon program termination, all of the memory the program has allocated will be freed automically by the system. -------------------------------------------------------------------------- > Are there going to be any sections numbers with 3 or more digits? > Or will all sections numbers have 1 or 2 digits only? Your program should be able to handle any int as the section number. -------------------------------------------------------------------------- > I have two questions. How can I read an individual word from file > and assign it to int or char? Shoud I use fscanf or use fgets and > sscanf? First, you'll be storing words in strings (i.e., character arrays), not int's or char's. You can use either fscanf or fgets/sscanf. Either one you choose will allow you to assign the string. > Second question is the range number of section we could have from 0 > to 30 or the number of different section are 30 regardless of the value? Your program should handle 30 different sections, regardless of the section values. -------------------------------------------------------------------------- > When I use function exit() in my function " write_index" like this way: > printf("Cannot open file index.txt!!"); > exit(); > } > > compiler give me the following error information: > > jedi3[50] gcc index.c > index.c: In function `write_index': > index.c:254: too few arguments to function `exit' > > I check the note in class and it shows me this way. Is there anything wrong? While exit() does have an int paramter, most compilers will not complain about leaving out the value. Also, I would suggest that you use man you have questions about function prototypes. Your problem is interesting since I tried compiling on that machine with gcc and got no error. -------------------------------------------------------------------------- > I have couple questions. > 1) could we assume if a word start with alphabet then it should be a valid > english word. I mean there won't have something like "W2O" and the in > similar, if a word start with a number, then it must be a integer, like > "54", and there won't have anything like "2nd", right? You can assume that words will not have embedded numbers in them and that integer section number will be integers. > 2) for word like "non-technical", should we treat it as one word or seperate > into two? If you take a look at the sample index created, it treats it as two words. I would suggest that for each line of text, you just run through the string and replace all non-alphabetic characters with blanks (after checking for a new section, of course). Then anything that's left is a word. > 3) should we change every char in a word into lowercase? such as for > "arrayLength", we should change it into "arraylength"? If you take a look at the sample index, you'll see that it changes every letter to lowercase and then inserts it into the index. > 4) when fscanf() and sscanf() read word from string and file respectively, > they will read word by word, which is seperated by space, right? And as my > understand, they will omit space, so we don't need to include space in > format, right? for example, for > char line[30] = "I am a student." > > if I write: > sscanf(line, "%s", word); > sscanf(line, "%s", word); > sscanf(line, "%s", word); > sscanf(line, "%s", word); > > then the first time word is "I", 2nd time word is "am", 3rd time word is > "a" and 4th time word is "student." right? > > will sscanf() and fscanf() add '\0' after each word it formed, like "I\0"? The scanf() family of functions will read a string from the current character to the next white space -- and no white space will be included in the string. They will, however, append a string terminator. You should test out the code you've written above in a short program. You will find that it doesn't work the way you think because you keep passing sscanf() the same string, and it will start processing that string at the beginning in each line. So, all of your words will be "I". You'll need to skip over each word after you read it and pass sscanf a pointer in the string where you want it to start reading. -------------------------------------------------------------------------- > I have read in the FAQ that we do not need to check > for hyphenated words i.e., there won't be any hyphenated word in the text > and also that the words will be seperated by a space.But in the sample text > you have put on the web for us to check on our program, there are > hyphenated words and in the sample output I understand that you have > checked for these hyphenated words and seperated them. > Also in the sample text there is a C program example and it has int > square(int n) as a part of it.Then my program reads the "square(int" part > as "squareint" since here square and int are not separated by space.For > this also in the sample output I see that you have separated the wors > square and int.Do we have to take care of these situations or should we > assume that there won't be these kind of situations in the text that will > be used to test our program.Please answer my question. Please create the index as the sample index is done. I would suggest that you just change all of the non-alphabetic characters to blanks after checking for the section number. Anything that's left is a word. -------------------------------------------------------------------------- > if some string like " The "public class..." keywords", we should read > "class..."" as a word and then delete the "..." right? Yes, delete all non-alphabetic characters. -------------------------------------------------------------------------- > for word from your smaple input, there are words like "i.e.,", should > we treate them as two words like "i" and "e" or one word like "ie"? > If we should treated it as two words, then it means not all words are > delineated by white space. If you look in the sample index, you'll see that they are treated as two separated words ("i" gets omitted, but "e" is still in there). You should also consider that words that appear with a comma or period after them are not delimited by white space either, until you remove all non-alphabetic characters. -------------------------------------------------------------------------- > The output of our program after omitting the words in omit.txt > should exactly match with the one given on webpage.!!right? That would be the ideal situation. If it doesn't match exactly, the only differences should be in spacing. > I have written a print_list() function which prints the words > and section numbers to help in making sure that the words have > been correctly inserted and deleted when the functions insert_word() > and exclude_words are called.Is it ok, if i just leave it in the program? It's OK to leave it in there. -------------------------------------------------------------------------- > Are we permitted to employ both the extra-credit features in > our program or only either of them ?? You may do both portions for up to 10 points extra credit. --------------------------------------------------------------------------