Auto-Summarisation of Text Document!

Of course, who has not seen the AutoSummarise tool in Word from MS Office. Some might have also played with it by changing the compression quotient i.e. the ratio of size of summary to document.

I really don’t know the algorithm used inside (of course, how can I? Its Microsoft!). Nor do I attempt to completely address all issues of auto-summarization. I was just trying to analyze how exactly do we summarize when we summarize a document? It involves a lot factors, besides picking up important details out of document, its title and other aspects like how short the document is going to be among others, who is going to read it (students, professors, researchers, layman etc) and others.

One thing which simply struck me was shortening big concepts made of small words into small set of big words. By big/small words, I mean how complex is its meaning, and not by its length. Dilution is a very common concept of language, where we tend to dilute a word into simpler words, just like we do in our computer languages (representing bigger modules using smaller/basic modules). While summarization is of course related to picking up prominent words in a document, however, I have not seen if these tools use new words to describe the document. Not all concepts are listed in a document in a detailed manner. Not just that, its quite helpful to simply replace set of words by single word, which more or less conveys the same meaning. In short, I’m talking about mapping of a meaning/semantics with single definition words.

Is it easy? NO. However, by semantic graphs (of RDF), perhaps the first stage can definitely be grown. For, if slight changes in meaning or structure of words occur, the graph is minimally perturbed. Complexity increases when higher-degree mapping is used for same definition. That means, when a same word can be composed of different words by rearranging smaller words into different groups. This may be reduced by grouping similar words, which implies picking a word from that group to replace a word does not change the meaning of the sentence.

Suggestions invited.

4 comments:

  1. abe organize ur blogs... its really tough to find new blogs bcoz of 6-7 headings?

    ReplyDelete
  2. dude sorry man.. depending on moods maine dher sara create kar diya.. ek aur kiya hai "index-sbharti" karke.. usi mein sara update kar deta hun :D

    ReplyDelete
  3. We couldn't talk about it in detail the other day, but I am writing about Kautilya as a dissenter. I think there are fair enough evidences to prove that. First of all, he was the first political realist in India. The prevalent ideology in India was no where near realism. It's foreign policy for one, wasn't a set of ruthless policies aimed only at self interest and nothing else. He gave some revolutionary ideas(for his time) about women, slaves and shudras. He thus clearly digressed from the prevalent ideology of his time. I have evidences to back up my claim which makes life easier :)
    I am majoring in Pol Sc with concentration on International Relations and my minor is philosophy :)

    ReplyDelete
  4. btw, i will be talking about his importance in today's times which i think is enormous.

    ReplyDelete