DuAL (Document Access Language)
This page was last modified on 21 May 2016, at 22:02.
|Designed by||Maxim Timchenko|
DuAL (Document Access Language) - a very high-level domain-specific language with dymanic typing, which combines the features of SQL and object-oriented programming to provide a way to manipulate documents.
DuAL was created in 2004 by Maxim Timchenko, student of Korolev Samara State Aerospace University as an instrument to automatically create educational resources. His scientific advisor, Prokhorov V. A., also took part in creating this language. Maxim studied existing instruments of manipulating documents like Visual Basic for Application and decided to create new language.
Language was influenced by ActionScript and SQL and provides a way to process documents with a set of rules.
Reasons for creating DuAL
When the man is reading the text, he takes individual characters using spaces as logical separators, and combines them into words. Points and line breaks are separating the sentences, sentences are united into paragraphs and paragraphs are united into chapters. This logic units can be mapped into infinite amount of viewed blocks, which is hard to parse by computer. The solution was a method that operates logical units of the document - text objects, such as a character, word, sentence, paragraph, chapter, section. Access to the project opens up great opportunities for data analysis.
Differences between DuAL and other text processing systems
Common-used text editors and text processors can't be used to automatically modify large texts. For example, changing formatting for some parts of big document or set of documents, or splitting huge document in several lesser documents requires manually working with them. Programming languages implemented in text editors can be used to manage such tasks, but they have a lot of disadvanteges which are disallowing wide use of them:
- data modifying commands are very hard to implement
- commands can be used only on document whuch was previously prepared and formatted to automatic processing
- code cannot be reused on the documents with different format
To overcome these disadvantages has been proposed a new principle of data processing based on the object model of the text control. One of its key features is the analysis of the set of rules provided by the human, according to which contents of the document are changed. The implementation of this method will allow to solve the following problems:
- Change of the displayed information in the electronic document;
- Decomposing contents of the document for the purpose of its use in e-learning courses and Internet resources;
- Converting sets of documents from one type to another;
- Direct access to the document metadatat;
- Printing pieces of data from source documents.
- Key functions, system constants and logical pointers methods are written in capital letters.
- Methods and indicators describing the parameters are written with a capital letter
- Object names are written in small letters;
- symbol . separator method on the object;
- A `comments;
- Symbol " text data. In order to use quotes in text used slash slash /;
- Character # at the beginning of the line indicates the preprocessor directive (Reserved).
Rule is a logically complete language construct, which consists of:
- Control function;
- Data processing parameters;
- Logical output pointer;
- Main object.
Scheme: Function parameters POINTER objectAll objects matching the rules are processed:
- The minimum structural unit is a symbol or an object that can not be divided into smaller parts;
- Sets the structural units are words, tables and compound objects;
- Logical text elements - paragraphs, paragraphs, chapters, and documents.
DuAL has free IDE which was created by developer of the language. After starting the application user gets the "Data preparation wizard" windows, consisting of several sub-windows:
- Selection of the base type of document processing;
- Choosing the result documents location;
- Choosing source documents;
- Control rules designer, setting the parameters of text blocks on which will build a data sample from the main sequence. Each of the three units must meet the text block in the document;
- Adding/replacing text blocks in files;
- Print Settings.
Upon completion of the "Data preparation wizard," a new project file is created, containing the rules for the processing of source documents. "Edit control regulations" and "Information processing status" windows are appearing and by analyzing the content of files created by the wizard, create a project tree with a list of documents to be processed. In the editor control rules, each logic element for readability is automatically highlighted with the corresponding color. The user can change the structure of the rules in accordance with their goals and personal preferences. The system automatically keeps track of all changes made to the structure of the rules. Based on these data, the system modifies the list of documents to be processed and monitoring the correct logical structures and the availability of the documents delivered for processing. The user initiates the processing rules, during which the screen displays information about the current state of the process: the amount of uploaded documents, generated text objects, uploaded files, additional information on the documents. The output information is presented in the form of reports with automatic grouping by categories. Upon completion of the processing of the user continues to work with documents-party software or built-in system. The result is a complete e-book, which contains the page information in the form of a tree structure, and provides to the user abilities of moving content, text search, and adding bookmarks.
'Loading files OPEN Mask("File.docx") AS source_files 'Detecting visible symbols GRUP source_files.Symbol(ALL) AS symbol 'Grouping symbols into words GRUP symbol.Where(NOSELF, LEFT) FOR symbol.Where(NOSELF, RIGHT) AS word 'Grouping words into sentences GRUP word.Where(новая_строка, EDGE, LEFT) FOR word.Where(новая_строка, EDGE, RIGHT) AS sentence 'Detecting chapter name GRUP sentence.Font(Type "Bold", Size "14", FUZZY) AS chapter_name 'Highliting chapter names with other color GRUP chapter_name.Apply(Font(Color "Red")) AS chapter_name_red 'Summing pointers GRUP source_files.Apply(имя_главы_красный) AS changed_files 'Saving files SAVE changed_filesAS Mask("Document%.html")