LebGeeks

A community for technology geeks in Lebanon.

You are not logged in.

#1 January 20 2011

Joe
Member

Exercise - File Stats

Hello geeks.

Today's exercise will be a little easier so that more people can participate. At the same time, you will notice that it will ask a lot of skills if you want to do things right. So without further ado, let's get started:

File Stats

This exercise is a variation on the first exercise we had on the forum: Word Count.

This time, you are asked to give more tricky data on your file namely:

* Number of characters.
* Number of characters without the space char.
* Number of words.
* Number of paragraphs.

As a reference, you can test your program on this file.

Feel free to create a GUI or deploy a webapp that does that, you don't have to be limited to the console. Oh, and you're free to use exotic languages. Bonus points for who does it in Erlang

Optional development for Unix developers:

The file should be sent as a command line argument, and your program should also have command line options:

-o or --output: specify a name for an output file.
-v or --verbose: shows the commands on the screen.
-h or --help: display a help.

Offline

#2 January 20 2011

arithma
Member

Re: Exercise - File Stats

Do the *nx'es have some library support to handle those command line arguments?

Offline

#3 January 20 2011

Joe
Member

Re: Exercise - File Stats

Yes. Well I don't know about other Unices, but most Linux distributions come equipped with the <getopt.h> library. The library is part of the GNU C library, so I don't think you'd find it by default on other Unices. I know Solaris has its own proprietary implementation of a <getOpt.h>. That's as far as I know.

It's somewhat complicated to use, but believe me it is easier than having to parse that command line yourself :)

Offline

#4 January 20 2011

Kassem
Member

Re: Exercise - File Stats

mmm I couldn't get the paragraphs count right... I'm using C# and checking whether the character is a '\n' or '\r', but it doesn't seem to work...

Offline

#5 January 20 2011

Georges
Member

Re: Exercise - File Stats

Does the . (dot) counts as a word ?

Offline

#6 January 20 2011

Georges
Member

Re: Exercise - File Stats

string allString = "";
int spacesCount = 0;
int carriageReturn = 0;
int multipleCR = 0;
string path = "YourFilePath";

int AllCharacters = 0;
int excludingSpaces = 0;
int Words = 0;
int Paragraphs = 0;

private void ReadFile()
{
    StreamReader sr = new StreamReader(path);
    allString = sr.ReadToEnd();
}

private void Display()
{
    string trimmed = allString.Trim();
    int nmb = allString.Length - trimmed.Length; // The number of leading and trailing white spaces...

    // 1 // All Characters...
    AllCharacters = allString.Length;

    // 2 // Calculating the number of spaces... // Including Leading and Trailing White Spaces...
    for (int i = 0; i < allString.Length; i++)
    {
        if (allString[i] == ' ')
            spacesCount++;

        #region Carriage Return...

        if (allString[i] == 13) // Ascii Representation of the Carriage Rerutn
        {
            // Check if next character is a CR too. If so, skip increment i by 1...
            carriageReturn++;

            if (allString[i + 2] == 13) // +2, since the new paragraph is denoted by \r\n. (2 cons. CR = \r\n\r\n
            {
                multipleCR++;
                i++; // Skip the next character...
            }
        }
        #endregion
    }

    int paragCount = carriageReturn + 1 - multipleCR;
    int wordCount = spacesCount - nmb + paragCount;

    // 2 // 
    excludingSpaces = (allString.Length - spacesCount).ToString(); // Could have used : allString.Replace(" ", "").Length.ToString();

    // 3 // +1 to include the last word...
    Words = wordCount.ToString();

    // 4 // Paragraphs Count...
    Paragraphs = paragCount.ToString();
}

And Rahmu, i couldn't use the test file you included.

pastebin.png

I won't try again in few minutes, i have a morning class tomorrow. and i desperately need to sleep.

Edit: Results for the file you included:

All Characters: 2924
Excluding WhiteSpaces: 2532
Word Count: 400
Paragraphs: 8

Last edited by Georges (January 21 2011)

Offline

#7 January 21 2011

xterm
Moderator

Re: Exercise - File Stats

Didn't bother much with the requirements, so I'll fail this. Just built something real quick to get something close to Georges's result.

P.S.: You won't be able to test this on the groovy web console because GAE disables fetchurl, but if you have groovy installed, just throw it in the console.

html_data = "http://pastebin.com/raw.php?i=0AG377K8".toURL().text.split('.dtd">')[1]
data = new XmlParser().parseText(html_data).depthFirst().pre.text().trim()

chars          = data.length()
chars_no_space = data.findAll { c -> c != ' ' }.size()
words          = data.split(' ').size()
paragraphs     = data.split('\n\n').size()

Result:

chars          : 2910
chars_no_space : 2518
words          : 393
paragraphcs    : 8

Last edited by xterm (January 21 2011)

Offline

#8 January 21 2011

Kassem
Member

Re: Exercise - File Stats

private void calculateStats()
        {
            int totalChars = 0;
            int numNoSpace = 0;
            int numWords = 0;
            int numParags = 0;

            if (string.IsNullOrEmpty(_contents))
            {
                MessageBox.Show("File was empty!", "Error!");
            }
            else
            {
                foreach (char c in _contents)
                {
                    totalChars += 1;
                    if (c != ' ') numNoSpace += 1;
                }

                numWords = _contents.Split(new string[] { " " }, StringSplitOptions.None).Count();
                numParags = _contents.Split(new string[] { "\n\r" }, StringSplitOptions.None).Count(); 

                NumCharsTB.Text = totalChars.ToString();
                NumWordsTB.Text = numWords.ToString();
                NumParagsTB.Text = numParags.ToString();
                NoSpaceTB.Text = numNoSpace.ToString();
            }

Produces the same results like xterm.

@Georges, are you sure about the Word Count?

Offline

#9 January 21 2011

Joe
Member

Re: Exercise - File Stats

/* 
 * Word Count version 0.2
 * 
 * Changelog: 
 *      - Added character and paragraph count
 *      - Command line arguments. 
 * 
 * 
 * author:          Joe "rahmu" Hakim Rahme
 * last modified:   21/01/2010
 * 
 */


#include <stdio.h>
#include <stdlib.h>

enum {out, in}; /* In this case enum is more appropriate than bool */

void wordCount (FILE* myFile){

    /* Initializing variables */
    int myChar = 0;
    int wordNumber = 0;
    int charNumber = -1; /* Can't really understand the bug, so I initialize at -1 ... for now */
    int parNumber = 0; 
    int state = out; /* Starting out of a word */
    int parState = out;

    while ((myChar = fgetc(myFile)) != EOF){
        charNumber++;

        if (myChar == ' ' || myChar == '\t'){ 
            state = out;
        }

        else if (myChar == '\n' || myChar == '\r'){
            state = out;
            parState = out;
        }

        else if (state == out){ /* We enter a new word */
            state = in;
            wordNumber++;
        }

        if (parState == out){
            parState = in;
            parNumber ++;
        }
    }
    
    fprintf(stdout, "%d word(s)\n", wordNumber);
    fprintf(stdout, "%d character(s)\n", charNumber);
    fprintf(stdout, "%d paragraphs(s)\n", parNumber/2);
}


int main (int argc, char *argv[]) 
{
    FILE* myFile = NULL;

    myFile = fopen(argv[1], "r");

    if (myFile != NULL) /* Testing if fopen worked */
    {
        wordCount (myFile);
        fclose(myFile);
    }
    else if (argc == 1) /* Is the file path supplied ? */
    {
        fprintf (stderr, "No file supplied as an argument.");
    }
    else
    {
        fprintf(stderr, "There was a problem accessing the file. Please make sure the file exists and is available.");
    }

    return 0;
}

Offline

#10 March 2 2016

Joe
Member

Re: Exercise - File Stats

Go code showing:

  • File manipulation

  • CLI argument parsing

  • regex

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"os"
	"regexp"
)

func get_characters_count(text []byte) int {
	return len(text)
}

func regex_non_space_count(text []byte) int {
	chars := regexp.MustCompile("[^ ]")
	return len(chars.FindAll(text, -1))
}

func get_words_count(text []byte) int {
	words := regexp.MustCompile("\\w+")
	return len(words.FindAll(text, -1))
}

func get_paragraph_count(text []byte) int {
	paragraphs := regexp.MustCompile("\n\n")
	return len(paragraphs.FindAll(text, -1))
}

func main() {
	if len(os.Args) > 2 {
		log.Fatal("Wrong number of CLI arguments")
	}

	data, err := ioutil.ReadFile(os.Args[1])
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("chars count:\t%d\n", get_characters_count(data))
	fmt.Printf("non ' ' count:\t%d\n", regex_non_space_count(data))
	fmt.Printf("words count:\t%d\n", get_words_count(data))
	fmt.Printf("parag count:\t%d\n", get_paragraph_count(data))
}

Offline

Board footer