Information Technology / Information indexing

Published: 29 February 2016 | Written by Super User | Print | Email | Hits: 6602

inf indeksi knjige

How many times did you browse an index at the end of a book? Do you consider it to be useful? Do you think a book can do without an index section?

Consider for a start a simple question whether there is an 'intellectual' worker who can do without massive literature, a number of personal working notes, numerous copy books, written or recorded lectures or lessons? It is hard to imagine the existence of an individual who attended a college level school and no need for a large amount of written or electronically coded information. During the life and work an intellectual shall collect an impressive volumes of literature and a large number of his own records. Therefore, it is even harder to imagine that such an educated person could work in his profession without a basic knowledge of how valuable and hard collected information should be classified and indexed.

Indexing

In this article we shall not deal with theoretical bases or discuss the various existing term definitions. I shall be very practical and accu inf indeksi mojIndeks

rate enough to reach the essence of the area I speak about, even at the expense to step out of range of the official definitions if that would be necessary for reader to get to the bottom of subject.

In it's very basic form indexing is the procedure of providing a list of terms (descriptors as they are officially named) that will accompany a book, an article or any other information along with a location (official name is locator) where the interested reader can find that term. More precisely index is at least an ordered list of notions, ideas, concepts as units of human thought as semantic representatives of certain information. In the picture on the right you have an excerpt from one of my indexes. You can spot quite obvious hierarchy here at the first glance: there are basic categories and the subcategories below. E.g. there is category "alati rucni" and one of the subcategories below is the "testere rucne on page 131".

Computer indexing vs semantic indexing

Computerized indexing is well known among IT people and librarians as "full text search". This indexing has been implemented on larger scale in IBM systems and up to now proved to be quite useful. The computer software picks up the keywords from text. Usually there is a word list that need to be skipped, e.g. conjunctions. The software have knowledge of syntax forms, which is very important for Serbian language for example. The picked terms are sorted alphabetically and here is your primitive index - you guess. But you guess wrong!

Computer indexing belongs to syntax type of derived indexing and brings no essential additional knowledge about text subject. This is the most primitive sort of indexing and it is used often for the low-level quality of information retrieval. Computer can allow searching of such an index by exact phrase or by logical combination of phrases (e.g. you connect terms with AND logical function to search for the simultaneously existing phrases in text, which is known as postcoordination).

Semantic indexing calls for human intervention. In this sort of indexing, it is quite desirable the indexing person knows pertinent area, and he will make e.g. a medical index for medical book from the official point of view of medical discipline. Indexing professional will create indexing terms which need not generally be in that particular syntax form in the text, but by his term choice he will allow for professional readers to precisely find the terms whereabouts in the book. In general case an indexing professional need to perform a whole bunch of logical operations, a few of which we intend to mention later, so the professional or amateur could by their specific and most common way find the usually sought terms. Semantic indexing is the real indexing and computer indexing is ideally a handy and additional means of search.

The essence of indexing

The basic task of indexer is to do the information adjustment to those readers` circle the information is intended for. The good and bad index differs by that reader finds the information in good index and do not find it in bad index. In large documentation systems, especially state officials' ones, the bad index makes relevant documentation worthless for search, degrading yourself to manual search literary or by means of computer.

The good index allows not only for fast search of pertinent information, but also a quick introduction to material, i.e. learning of material. By doing a survey of shorter index you can very fine see also the scope size and it's logical "position" in the pertinent knowledge domain.

As you hopefully understood from the previous, the hand made semantic indexing is by far better, competent and exhaustive with meaning then simple computer selected word list is.

Now I propose you pause reading at this point, take any copy book of yours, and just try to make a simple index for the first 10 pages of the copy book, by following no directions, taking only a previous example of my index excerpt. When you do that go on with reading to see some ideas how it should be (really) done.

Primary approach to indexing

inf indeksi Ranganatan When approaching to indexing you should start from the fact you do not make another table of contents. Table of contents is the chronological list of titles from material and basically it has no connection to indexing. In indexing you creatively make your own notions as units of thought and supply each one of them with locators (page numbers, positions on shelf, and more of the like) where that notion (possibly in completely different syntax form) is presented in the material. Only such index has the highest value.

In indexing you as an author take a stand about which particular ontological system are you going to implement: e.g. in indexing of material from electronic area you will take a position of electronic discipline typical for electronic textbooks, or you can decide to take a stand of dilettante approach (i.e. adopt terminology to dilettantes), or decide to merge popular and professional choice of index terms.

Start by assuming you are going to make a single level index, i.e. word list with no sub-classes. Later you can deepen the index more.Terms used can comprise one or more words, but they always make one notion (as human thought). E.g. term "organ" and term "artificial organ" are very different and in most cases shall be separately quoted terms.

Terms should be always adopted to search process specifics, i.e. "artificial organ" you should better quote as "organ, artificial - 234", because search is mostly done that way. You can additionally write the term: "artificial organ; see organ, artificial", to make sure that some rarer forms get answered in search.

In aggregate notion from several words you are using so called phase relations, introduced by fantastic Indian librarian and mathematician Ranganathan, a man who named the English systems of classifications, developed for centuries, "an intellectual laziness", and made them almost ridiculously outdated by introducing his own brilliant "facet" method of classification (mutually exclusive notions, extendable and literary warrant).

Phase relations example: notion "Statistics for Librarians - 123" you will transform by means of phase relation to "Librarians, Statistics for - 123".

Digression:

What a Serb Nikola Tesla has been in the area of electronics that has been the Indian Ranganathan in librarianship. This leads us to the thought
that nations with no historical opportunities to be heard of can have extraordinary individuals which by their great deeds leave behind in dust the creative individuals of other "better known" nations. That leads us to re-think whether "better known" nations maybe have just media and canons noise on their side more then other creative virtues.

Aggregate multi-word notions and precoordinated notions

As I told previously a notion can be formally expressed with one word, or multiple words, typically two. However there is precoordinated notion which consist of two separate notions, both of which can also exist in index on their own. But in precoordinated notion they are (also) joined together and represent unity of human thought. E.g. "Heat treatment of aluminium" as precoordinated notion, and two constituent notions "thermal treatment, aluminium" and "aluminium, thermal treatment".

Precoordinated notions are used in situations where that thought or concept is so important for text that you introduce the notion as unity, especially if one or both separate themes "thermal treatment" and "aluminium" do not exist in text.

If you still have some questions about precoordinated notions, look at the picture on the right where is I suppose Sophia Loren in the pause of movie shooting. Sophia had more then enough reasons to be separate notion in our index, that being clear if we take into consideration all their specific attributes. On the other hand, depending on the text theme, we can introduce a notion "Sophia Loren and actor X.Y" :-)

Interesting indexing templates

There are a lot of practical approaches to indexing, based on broadly known or private templates of indexers. Those templates are used for two purposes. One purpose is to help you extracting index notions. The other template usage is for determining the order of words in aggregate term:

things and their parts (physical objects and persons, geographical terms, entities)
materials (mess nouns, substances, gasses)
activities and processes (methods, sports, work, activity)
events and occurrences (social events, abstractions as social occurrences)
characteristics and states of persons, things, material or actions
scientific disciplines
units of measure
the other not mentioned above, heterogenious

So when reading parts of text, make use of pertinent categories to create your own notion list. The other template is more of general scientific nature:

entity
abstraction
activity
attribute
heterogeneous, combined

The order by Keiser

thing
process

The order by Ranganathan:

personality (more of entity, the basic term we speak about)
material
energy
space
time

Order by Coates:

thing
part
material
activity
agent

Medical order by Vickery:

substance, product
organ
constituent
structure
shape
property
patient, raw material. They are certainly raw materials for doctors...
action
operation
process
agent
space
time

There are such orderings or general categories quite a lot, practically for every area or concept. The very general order I am using myself is this:

personality
part
structure
form
material
process
operation
intermediate product
agent
discipline
space
time
metadata

Relations

The standard index has both created terms and hierarchy among them along with relations between terms.

Equivalence:
salt, see sodium chloride
Hierarchy - notions of more narrow term scope, e.g.:
software -543
NT utility
NT freeware
NT cracked
Hierarchy - notions of broader term scope e.g.:
user software
BT software
association e.g.:
break cylinder - 234
RT car

What to do next

This whole text is organized just as a very direct and short introduction into indexing. The value of information is directly determined by our ability to find the information in acceptable time interval and easy enough. Even most valuable information hidden in organizational chaos - is worthless. When talking about accessibility of information the quality of indexing plays the key role. There are standards and professional regional and national associations of indexers, proving the importance of this profession. For that reason if this articles tickled some interest in yourself you should definitely read a book on indexing. After that you need to make all indexes for your needs, then some for your firm, in order to get at least some basic experience in this.

In the globalist darkness around us, the education if not already in chaos, is submitted to the actual needs of globalist narrow specialization, while the general education, generic world comprehension and human culture is systematically damped in any way possible - by education system, media abuse, blackmail, in one word - by the system in whole. That is why even such elementary knowledge as information categorization and indexing stay undiscovered by broad public, like many other knowledge areas by the way.

25.2.2016

Славиша Нешић

РАЗУМ Уместо медија, хуманизам уместо глобализма, рекреација уместо спорта