Indexing

Computer indexing vs semantic indexing
Computerized indexing is well known among IT people and librarians as "full text search". This indexing has been implemented on larger scale in IBM systems and up to now proved to be quite useful. The computer software picks up the keywords from text. Usually there is a word list that need to be skipped, e.g. conjunctions. The software have knowledge of syntax forms, which is very important for Serbian language for example. The picked terms are sorted alphabetically and here is your primitive index - you guess. But you guess wrong!
Computer indexing belongs to syntax type of derived indexing and brings no essential additional knowledge about text subject. This is the most primitive sort of indexing and it is used often for the low-level quality of information retrieval. Computer can allow searching of such an index by exact phrase or by logical combination of phrases (e.g. you connect terms with AND logical function to search for the simultaneously existing phrases in text, which is known as postcoordination).
Semantic indexing calls for human intervention. In this sort of indexing, it is quite desirable the indexing person knows pertinent area, and he will make e.g. a medical index for medical book from the official point of view of medical discipline. Indexing professional will create indexing terms which need not generally be in that particular syntax form in the text, but by his term choice he will allow for professional readers to precisely find the terms whereabouts in the book. In general case an indexing professional need to perform a whole bunch of logical operations, a few of which we intend to mention later, so the professional or amateur could by their specific and most common way find the usually sought terms. Semantic indexing is the real indexing and computer indexing is ideally a handy and additional means of search.
The essence of indexing
The basic task of indexer is to do the information adjustment to those readers` circle the information is intended for. The good and bad index differs by that reader finds the information in good index and do not find it in bad index. In large documentation systems, especially state officials' ones, the bad index makes relevant documentation worthless for search, degrading yourself to manual search literary or by means of computer.
The good index allows not only for fast search of pertinent information, but also a quick introduction to material, i.e. learning of material. By doing a survey of shorter index you can very fine see also the scope size and it's logical "position" in the pertinent knowledge domain.
As you hopefully understood from the previous, the hand made semantic indexing is by far better, competent and exhaustive with meaning then simple computer selected word list is.
Primary approach to indexing
When approaching to indexing you should start from the fact you do not make another table of contents. Table of contents is the chronological list of titles from material and basically it has no connection to indexing. In indexing you creatively make your own notions as units of thought and supply each one of them with locators (page numbers, positions on shelf, and more of the like) where that notion (possibly in completely different syntax form) is presented in the material. Only such index has the highest value.
In indexing you as an author take a stand about which particular ontological system are you going to implement: e.g. in indexing of material from electronic area you will take a position of electronic discipline typical for electronic textbooks, or you can decide to take a stand of dilettante approach (i.e. adopt terminology to dilettantes), or decide to merge popular and professional choice of index terms.
Start by assuming you are going to make a single level index, i.e. word list with no sub-classes. Later you can deepen the index more.Terms used can comprise one or more words, but they always make one notion (as human thought). E.g. term "organ" and term "artificial organ" are very different and in most cases shall be separately quoted terms.
Terms should be always adopted to search process specifics, i.e. "artificial organ" you should better quote as "organ, artificial - 234", because search is mostly done that way. You can additionally write the term: "artificial organ; see organ, artificial", to make sure that some rarer forms get answered in search.
In aggregate notion from several words you are using so called phase relations, introduced by fantastic Indian librarian and mathematician Ranganathan, a man who named the English systems of classifications, developed for centuries, "an intellectual laziness", and made them almost ridiculously outdated by introducing his own brilliant "facet" method of classification (mutually exclusive notions, extendable and literary warrant).
Phase relations example: notion "Statistics for Librarians - 123" you will transform by means of phase relation to "Librarians, Statistics for - 123".
Digression:
that nations with no historical opportunities to be heard of can have extraordinary individuals which by their great deeds leave behind in dust the creative individuals of other "better known" nations. That leads us to re-think whether "better known" nations maybe have just media and canons noise on their side more then other creative virtues.
Aggregate multi-word notions and precoordinated notions
As I told previously a notion can be formally expressed with one word, or multiple words, typically two. However there is precoordinated notion which consist of two separate notions, both of which can also exist in index on their own. But in precoordinated notion they are (also) joined together and represent unity of human thought. E.g. "Heat treatment of aluminium" as precoordinated notion, and two constituent notions "thermal treatment, aluminium" and "aluminium, thermal treatment". Precoordinated notions are used in situations where that thought or concept is so important for text that you introduce the notion as unity, especially if one or both separate themes "thermal treatment" and "aluminium" do not exist in text. If you still have some questions about precoordinated notions, look at the picture on the right where is I suppose Sophia Loren in the pause of movie shooting. Sophia had more then enough reasons to be separate notion in our index, that being clear if we take into consideration all their specific attributes. On the other hand, depending on the text theme, we can introduce a notion "Sophia Loren and actor X.Y" :-) |
![]() |
Interesting indexing templates
There are a lot of practical approaches to indexing, based on broadly known or private templates of indexers. Those templates are used for two purposes. One purpose is to help you extracting index notions. The other template usage is for determining the order of words in aggregate term:
- things and their parts (physical objects and persons, geographical terms, entities)
- materials (mess nouns, substances, gasses)
- activities and processes (methods, sports, work, activity)
- events and occurrences (social events, abstractions as social occurrences)
- characteristics and states of persons, things, material or actions
- scientific disciplines
- units of measure
- the other not mentioned above, heterogenious
So when reading parts of text, make use of pertinent categories to create your own notion list. The other template is more of general scientific nature:
- entity
- abstraction
- activity
- attribute
- heterogeneous, combined
The order by Keiser
- thing
- process
The order by Ranganathan:
- personality (more of entity, the basic term we speak about)
- material
- energy
- space
- time
Order by Coates:
- thing
- part
- material
- activity
- agent
Medical order by Vickery:
- substance, product
- organ
- constituent
- structure
- shape
- property
- patient, raw material. They are certainly raw materials for doctors...
- action
- operation
- process
- agent
- space
- time
There are such orderings or general categories quite a lot, practically for every area or concept. The very general order I am using myself is this:
- personality
- part
- structure
- form
- material
- process
- operation
- intermediate product
- agent
- discipline
- space
- time
- metadata
Relations
The standard index has both created terms and hierarchy among them along with relations between terms.
- Equivalence
:
salt, see sodium chloride - Hierarchy - notions of more narrow term scope, e.g.:
software -543
NT utility
NT freeware
NT cracked - Hierarchy - notions of broader term scope e.g.:
user software
BT software - association e.g.:
break cylinder - 234
RT car
What to do next
This whole text is organized just as a very direct and short introduction into indexing. The value of information is directly determined by our ability to find the information in acceptable time interval and easy enough. Even most valuable information hidden in organizational chaos - is worthless. When talking about accessibility of information the quality of indexing plays the key role. There are standards and professional regional and national associations of indexers, proving the importance of this profession. For that reason if this articles tickled some interest in yourself you should definitely read a book on indexing. After that you need to make all indexes for your needs, then some for your firm, in order to get at least some basic experience in this.
In the globalist darkness around us, the education if not already in chaos, is submitted to the actual needs of globalist narrow specialization, while the general education, generic world comprehension and human culture is systematically damped in any way possible - by education system, media abuse, blackmail, in one word - by the system in whole. That is why even such elementary knowledge as information categorization and indexing stay undiscovered by broad public, like many other knowledge areas by the way.
25.2.2016