Software database structure: basic approaches.

A good article on the topic: Versioned migration of the database structure: basic approaches.

Features of database migration in case of canary releases: On the way to Canary (there is also a description of this type of releases).

Delivery of changes

Note: An alternative approach is described in the Docker chapter.

So, you have a civilized and centralized code. But what about laying out new functionality for end users? Shipping method may vary. For example, in the case of an intracorporate thick client (a la AWP), this can be laying out the installation package in a network folder, and in the case of a site, an archive with scripts. It is important to introduce the concept of a distribution kit – a complete set of software suitable for distribution. It is important that the distribution kit should not contain contour-dependent settings and specific configuration files. You should be able to “roll out” the once assembled distribution to any circuit, and with this installation immediately bring the system to the version fixed in the distribution, including the database migrations that we talked about earlier.

The distribution kit of the system should be formed on a separate machine, called the build server (buildserver). The reason is trivial: if the “combat” distribution is built on a specific developer’s computer, it can be very difficult to repeat this configuration in the event of some kind of breakdown or abnormal situation. You can also get into an uncomfortable position if the developer goes on vacation or quits.

Returning to the conversation about startups. There is such software GitKraken. I used it for a couple of months when I was in despair after the final slide of SourceTree into the abyss of bugs. So, I accidentally came across the installation package of this git client when I was practicing extracting information from an EXE. It turned out that the distribution kit is going to a specific user on the machine, in a folder a la C:\Users\Joey\Documents\…. And they looked technologically advanced and modern.

Let’s continue the theme of the buildserver. It should have an ideal environment installed, the necessary set of tools and settings to turn the source code into the final distribution. Build steps, rules, and commands are usually configured in the Jenkins, GitLab, or TeamCity class software. In a narrow sense, they can be called an assembly system. A special agent will need to be installed on the buildserver, allowing it to interact with the build system. The result of going through the assembly steps is usually the so-called. an artifact (or several artifacts), an example of such an artifact would be a distribution. It makes sense to store build artifacts outside the build system, such as in Nexus/Artifactory. This will help solve the issue of reliable storage of all versions of system assemblies.

System version

In the chapter about the delivery of changes, I want to additionally touch on the concept of a system version. The version format can be anything:

simple numeric, for example 7, 25, 144;

compound, for example 25.4, 33.1;

canonical (it is classical semantic), for example 7.1.33 (MAJOR.MINOR.PATCH);

custom, for example 7.1.33.eb4512a (commit hash added at the end).

Rule: The software receives a unique sequential version number when it is built. TeamCity, for example, has a special build counter, an automatically incrementing counter that is ideal for assigning unique numbers. It follows from the rule that:

there should not be different assemblies (distributions) with the same version number;

a later build must have a later version.

Additionally, I note that you should not introduce separate versions for the database, server side, thick client, etc. On the contrary, all components of the system that are in the distribution must have a single pass-through version number. This approach will help when installing system updates, making errors and logging incidents.

Automatic Deployment

So, now you have distributions created: by a button or automatically when merging into the master branch. You can, of course, install them on the target contours manually, but it would be more far-sighted to set up automatic installation. In fact, this is a certain set of instructions and commands that are executed sequentially. At the same time, it is important to remember that some commands require parameterization for the target contour (since the distribution kit is contour-independent). Implement this using TeamCity, GitLab, or even python scripts. I will consider the case of using Ansible / AWX.

I will quote one of the articles about Ansible on Habré:

Ansible is a software solution for remote configuration management. It allows you to configure remote machines. Its main difference from other similar systems is that Ansible uses the existing SSH infrastructure, while others (chef, puppet, etc.) require the installation of a special PKI environment.

Importantly, Ansible allows you to play some scripts on a remote host. A typical auto-installation scenario may include downloading a distribution, unpacking, stopping the service, parameterizing configuration files, copying files, running database migrations, restarting the service. Loop dependent variables can be organized using inventories, with one inventory per loop.

Package Manager

A small addition to the buildserver theme: for external plugins and libraries, it is convenient to use the package manager. For example pip for Python, nuget for C#, npm for Node.JS. Package manager configurations should be kept in source control along with the main system code. The build steps on the buildserver should include steps to restore and download packages according to the configuration file/files. It will also make it easier for a new developer to join the team – he will not need to configure his working environment in a special way, the package manager will take care of everything. The only negative is that some specimens sin with heaviness, but little can be done about this, they still bring more benefits.

Software development Communication

But today we choose Wiren Board. And the functionality, and the form factor, and flexibility, and good support, we were completely satisfied. It cannot be said that the price of this option is low, but our requirements were not low either. We understand that all good things cost money, and at this stage, the price-performance ratio suits us.

It is gratifying that many of the readers of Geektimes in our last article immediately recognized the Wiren Board platform – this was a pleasant moment and confirmed the popularity of this manufacturer of industrial microcomputers. For our part, we can only give positive feedback about their product so far, and we hope that this will always be the case.

Communication between the lower and upper levels

Even if all the elements of the upper and lower levels work like a clock (not in the sense that they show time, but in the sense of accuracy), they must also work together with each other, like a good team.

Communication is a very important part of any interaction, and its quality directly affects the quality of the entire solution. With third-party solutions, we often saw that communication issues were given negligible attention, which greatly narrowed the scope, and this unfortunate omission was one of the main impetuses for the development of our RedPine platform.

In our product, we approached communication issues with all seriousness – this applies both to the methods of information transfer themselves, and to the correct compression and packetization of data to avoid losses and problems with insufficient communication channel bandwidth.

The lower-level hardware has all the necessary interfaces for data transfer: GSM, 3G RS 485, 232, TCP/IP. They can function separately or simultaneously and work without problems with weak communication channels. Even if the equipment is located in the tundra or taiga, it will be in touch. If necessary (or at the request of the customer), the system can be retrofitted with other communication interfaces.

The RPL’s own data transfer protocol is responsible for the security of information, which combines an encryption protocol, checking the checksums of the data stream, and backing up data in its own memory until receipt of a confirmation of receipt from the server. Nothing gets lost or lost along the way.

RedPine can be easily integrated into existing information systems using the Modbus and SNMP protocols, and the underlying hardware can be used as an additional gateway.

Top level software

The main task of upper-level software is to be a kind of hub, a link between upper-level hardware, lower-level software and a person.

That is, the top-level software must provide the necessary user interaction with all elements of the monitoring and dispatching system. He is both the brain and the face of RedPine, which means he must be smart, handy and likable at the same time.

First, about the brain, which is hidden from the user. Here we did not use ready-made solutions, and everything had to be written from scratch. This software is responsible for storing, processing, analyzing and transferring data between various elements of the upper and lower levels, and, among other things, it was critical for us that all this be optimized and work quickly on various hardware. Bad optimization can ruin even the best functionality in one fell swoop. this rich functionality cannot be used.

Diesel generator set monitoring and control system interface (Mnemonic)

Now let’s turn to the face of the system. Here the appearance is important, and it is needed not only for beauty – everything should be clear and convenient for daily use by people without special training. An incomprehensible interface, in fact, plays against the user, forcing them to make mistakes that can sometimes be fatal and result in large financial losses. It is from this understanding that our developers proceeded when designing the visual part of the top-level software. I will talk about the user interface of RedPine some other time, let’s not go away from the main topic now. However, you can watch it right now on the demo version (link) – its interface is no different from the basic real versions.

“Soft” lower level

Since lower-level software runs on lower-level hardware, it must communicate with it in the same language. That is why we had requirements for the controller manufacturer, which concerned the operating system used and the internal algorithms of the device.

This software is responsible for receiving commands from upper-level software, processing them and transferring them to lower-level hardware actuators, such as a controller, expansion modules and additional attachments (sensors, control elements, etc.). And also for the return trip – the data received from the iron of the lower level must be processed and transferred to the upper level.

Here it is necessary to emphasize one of the important functions of the lower-level software – it converts all kinds of signals from various equipment (by type, by manufacturer, by operation logic, by year of release) into a single data format that allows you to control and manage such “motley” equipment from a single center. This is one of the key features that we did not find in other monitoring systems, which prompted us to create our own.

There is no user interface here, because. this is the internal kitchen of the platform, and management occurs through the top-level interface. Only authorized personnel can directly access the lower level software.

When we talk about a complete RedPine solution, we always mean multiple levels of hardware and multiple levels of software. It is never some simple magic box that works by itself and can do everything – it is always several systems interconnected by wire or wireless. Our platform is flexible enough to build highly specialized solutions. Moreover, this flexibility applies to the communication systems used, and the equipment used at all levels, and even the user interface – everything can be customized and configured for special tasks.

How to Manage Software Projects Effectively

How to manage software projects effectively

Managing software projects is a complicated task, which requires a lot of planning and attention. There are many factors that can affect the quality of your software project, such as the type of technology that you are using, the size of your team, and your budget.

Scope management

Whether you are starting a new project or extending an existing one, effective scope management is essential. It helps keep the project on schedule and within budget. If not, you might experience scope creep, which can lead to lost time, reallocation of resources, and cost overruns. Luckily, there are several steps you can follow to avoid the dangers of scope creep.

The first step in effective scope management is defining the project’s requirements. This can involve collecting information from stakeholders, focus groups, and surveys. It also involves defining the goals, functions, and features of the product.

Next, you must develop a work breakdown structure, which breaks the project’s scope down into manageable pieces. This structure gives teams and individuals a list of deliverables they are responsible for. It can also include tasks and deadlines.

Milestones

Using milestones to manage software projects can be a great way to make sure your team keeps on track. The milestones themselves can vary, depending on the project, but they are generally a time frame or a deadline. The timelines should be visible to everyone, and include both tasks and deliverables.

It is important to set a realistic and achievable goal for each milestone. Not every task is worth attaching to a milestone. There are some tasks that are more worthwhile than others, but it is still best to use milestones to keep track of the project.

There are many tools to help you manage your projects. Some of these include milestone plans and diagramming tools. These tools can help you build a timeline to track your progress, which is particularly useful when you’re working on a big project.

KPIs

Using KPIs to manage software projects effectively can provide you with valuable information about the status of your project. They can help you measure and refine your results. You can also use them to identify and eliminate problems in your process. This helps you get to the end of your project faster.

Managing software projects effectively means using KPIs that are specific to your project’s goals. This allows your team to understand and follow your requirements. It also holds your team accountable for consistent output.

For example, the Schedule Performance Index can be used to measure the progress of a project. It’s calculated by dividing the earned value by the planned value. A CPI of less than one indicates that the project might be delayed. A higher value indicates that the project is performing on target.

Documentation

Whether you are building a new software product or enhancing an existing one, documentation is an essential tool to help you manage the project effectively. Without proper documentation, teams can lose track of the project’s progress, resulting in missed deadlines and misinterpretations. In addition, a well-written document can help you avoid a failure by bringing clarity to your project’s goals and requirements.

A comprehensive written communication reduces the chances of work duplication and time spent in meetings. A file naming system is also an effective way to find documents easily. The naming system includes the author, the document type, and other important information.

Another important tool is version control. This allows you to track changes to a document and create an audit trail. This means you can easily spot errors. This also keeps the quality of your project documentation high.

Software development at nuclear station


There is a lot of regulatory documentation in the nuclear industry. GOSTs, internal instructions, regulations, permits… even has its own standardization institute. The documents describe the requirements for processes, materials, equipment operation. All these regulations are constantly needed in the work.

Rosatom nuclear power plants must meet a bunch of requirements: from concrete grades to the stability of structures against natural disasters. They are written in a thousand regulations, from federal GOST to local standards. Putting them together, separating specific requirements from the descriptive part, not forgetting anything is not easy.

For example, there is a GOST for the production of concrete for creating formwork: 25 pages, 27 links to other GOSTs, which also have links. In total, several thousand pages need to be processed. And this is a drop in the ocean of regulatory documentation that is used in Rosatom.

Moreover, the documentation is mostly paper. Working with it is inconvenient, even when there are scans. Rosatom took on the ambitious task of translating these documents into a single internal format for the requirements management system. It should be convenient for both people and software systems. To collect all the requirements for new projects in a couple of clicks, generate checklists and create project documentation.

The data preparation team was faced with the following task:

Markup of existing documentation for the requirements management system format. This is a natural language processing (NLP) task. Previously, experts were engaged in this work – they manually processed documents, classified, divided into paragraphs and elements: requirements, information, tables, figures.

It may seem that this is easy to do with already existing programs and modules for text and document recognition. But the documentation in nuclear power plants is too specific and complicated. Therefore, we make a decision in text markup from scratch in Python. It is a universal language for quickly testing ideas and building applications. Relatively leisurely Python performs well.

We also use ready-made components: the Pymorphy library, the Django front-end application, a small test sample for experimenting with Flask. But the algorithm is completely self-written, without scikit-learn and other tools, based on semantic text analysis. Sometimes we use Tesseract, but to a limited extent: its recognition accuracy is not high enough. Pictures are processed with OpenCV.

Many people know Python – and, moreover, well. But it is also important for us to come up with and implement interesting algorithms on it. You need an outlook, observation, a creative streak, the ability to play in a team, not to become isolated in yourself and a narrow task – and, of course, an excellent knowledge of algorithms.

How documentation recognition works
For the user there is a web application with a GUI. Documents are thrown into it.

The algorithm parses them: assigns a certain type according to the classifier, then analyzes and classifies each component – paragraphs of text, tables, infographics, drawings.

We trained the algorithm on already marked up documents. The standard is highly formalized, resulting in a very high-quality and accurate solution. The recognition accuracy is above 99%, the speed is five times faster than manual processing. This is despite the fact that experts are still analyzing illustrations and tables, endorsing, and rechecking textual data. In addition, the algorithm does not get tired and is not distracted – no human factor.

After the elements are collected in the summary table of the database. Each is assigned a specific class. Relationships are built between them: what follows what, in what places illustrations and tables are located.

The result of digitizing normative documentation: the text is divided into fragments and components with unique IDs and classes. Source: Rosatom
The result of digitizing normative documentation: the text is divided into fragments and components with unique IDs and classes. Source: Rosatom
Why are tables, infographics, and figures processed by experts anyway? The price of a small error in recognizing such regulatory elements is extremely high: we are talking about nuclear power plants. Mistaken by a thousandth of a percent in a table with valid parameters? They went beyond their border during operation and fundamentally disrupted the operation of the station.

The finished table in the database is already the native format of the internal requirements management system of Rosatom. The system also has the ability to export documents to other popular formats.

While we were writing our solution, several successful components and algorithms appeared that are suitable for other projects. For example, a classification algorithm based on semantic analysis. He, in fact, can analyze almost any textual sources, make generalizations and conclusions based on them. This opens up huge opportunities in information analytics.

What to read and see

natural language processing
Donald Knuth “The Art of Programming”

Niklaus Wirth “Algorithms and Data Structures”

Conclusion


So far, the popularity of Python in projects related to artificial intelligence, neural networks, machine learning and other hype areas continues to grow. It is unlikely that new languages and technologies will be able to challenge its position in the coming years. Moreover, a powerful and diverse tooling managed to develop around it: there are more than 10,000 libraries and frameworks alone.

And what stack for tasks in ML, AI, NN do you use? Are you satisfied with Python, do you see problems in its ecosystem, or do you prefer other technologies: the same JS with its TensorFlowJS, R, Julia, C++, Kotlin, or something else?

By the way, if you have questions about the described projects and technologies, do not be shy, the guys from the nuclear power plant monitor the comments and will try to answer them.