Misanthrope's Thoughts

What is the cause for the degradation of environment?
Capitalism, corruption, consuming society? - OVERPOPULATION!
Please, save the Planet - kill yourself...

Sunday, December 14, 2014

How to Get Accurate Measurements in QGIS

This post is mainly my answer to the corresponding question.

The main rule for accurate measurements is to define correct CRS for your layers and project. This is somewhat wide theme to talk about so I won't ;-) 

Lets just discuss what you should do when the project CRS can't be easily used for measurement (like Web Mercator) and you want to use Measure tool to calculate distance or area. Go Settings -> Project Properties, in CRS tab enable on the fly transformation; in General tab in Measure tool menu choose ellipsoid for distance calculations. See picture:


When you start measurements with the Measure tool move your cursor over the Measure window - an information about measurement settings will pop up just like that:


Now you can make your measurements and be sure that they are accurate. NOTE that you should always check Measure tool settings like it shown at the picture above because Ellipsoid settings for the Measure tool are not always stored or displayed in Project settings correctly (QGIS 2.6.1).

Sunday, November 30, 2014

A Flawed Research on a TSP Algorithm

A Travelling Salesman Problem (TSP) is a well known computational challenge. A lot of algorithms were developed to solve it or its special cases.

I came around an article authored by Fang Liu 'A dual population parallel ant colony optimization algorithm for solving the travelling salesman problem'. In this article he proposed a modification of an Ant Colony System algorithm for solving TSP and presented results obtained by his algorithm. In the table with results all looked fine - his algorithm was able to provide very good solutions for the TSP instances from TSPLIB (which is the common testing ground for TSP algorithms).

So the researcher presented good results... it seems. But then he decided to show the best routes his algorithm was able to find and annotated them with the corresponding route costs. Lets take a look at one of them. Here you are his best route for the 'att48' instance from the TSPLIB:

Route that claims to be optimal (but the cost is very wrong)
The optimal route for 'att48' and its cost is well-known (it applies to all TSPLIB instanses). Its cost is approximately 33523 (there are different approaches to rounding distances between points). So what we see at the picture above should be the optimal route (or extremely close to it). But dear reader, do you think that you see optimal route? Humans are able to provide very good solutions to TSP instances that consists of not too many points. I bet you can draw far better route yourself. The route from this picture is 1, 8, 46, 33, 20, 17, 43, 27, 19, 37, 6, 30, 36, 28, 7, 18, 44, 31, 38, 9, 40, 15, 12, 11, 47, 21, 13, 25, 14, 23, 3, 22, 16, 41, 34, 2, 29, 5, 48, 39, 32, 24, 42, 10, 45, 35, 4, 26, 1 and its cost is 41052 which is whooping 22% far from optimal! The same story for another illustration in the article.

Here take a look at the optimal route which cost is really 33523:
Actually optimal route for 'att48' with Cost = 33523
So what we can conclude? I do believe that the routes demonstrated in the article are the best routes found by given algorithm, but costs are put-up for the routes and for the table of results in the article as well. I think that author developed an algorithm that wasn't able to find good solutions and provided fraud table with the put-up testing results. And clearly this article wasn't reviewed by scientist that have knowledge in TSP area because these plots are so obviously flawed that one can't overlook it!

No one usually shows plots of the routes they find with their algorithms. I wonder if there are more modern algorithms with the put-up results?

Sunday, November 2, 2014

How to Create Delauney Triangulation Graph from a .shp-file Using NetworkX

For my own project I needed to create a graph based on a Delauney triangulation using NetworkX python library. And a special condition was that all the edges must be unique. Points for triangulation stored in a .shp-file.

I was lucky enough to find this tread on Delauney triangulation using NetworkX graphs. I made a nice function out of it for point NetworkX graphs processing. This function preserves nodes attributes (which are lost after triangulation) and calculates lengths of the edges. It can be further improved (and most likely will be) but even at the current state it is very handy.

Example of use:
import networkx as nx
import scipy.spatial
import matplotlib.pyplot as plt

path = '/directory/'
f_path = path + 'filename.shp'
G = nx.read_shp(f_path)

GD = createTINgraph(G, show = True)


Code for the function:
import networkx as nx
import scipy.spatial
import matplotlib.pyplot as plt

def createTINgraph(point_graph, show = False, calculate_distance = True):
  '''
  Creates a graph based on Delaney triangulation

  @param point_graph: either a graph made by read_shp() from another NetworkX's point graph
  @param show: whether or not resulting graph should be shown, boolean
  @param calculate_distance: whether length of edges should be calculated
  @return - a graph made from a Delauney triangulation

  @Copyright notice: this code is an improved (by Yury V. Ryabov, 2014, riabovvv@gmail.com) version of
                    Tom's code taken from this discussion
                    https://groups.google.com/forum/#!topic/networkx-discuss/D7fMmuzVBAw
  '''

  TIN = scipy.spatial.Delaunay(point_graph)
  edges = set()
  # for each Delaunay triangle
  for n in xrange(TIN.nsimplex):
      # for each edge of the triangle
      # sort the vertices
      # (sorting avoids duplicated edges being added to the set)
      # and add to the edges set
      edge = sorted([TIN.vertices[n,0], TIN.vertices[n,1]])
      edges.add((edge[0], edge[1]))
      edge = sorted([TIN.vertices[n,0], TIN.vertices[n,2]])
      edges.add((edge[0], edge[1]))
      edge = sorted([TIN.vertices[n,1], TIN.vertices[n,2]])
      edges.add((edge[0], edge[1]))


  # make a graph based on the Delaunay triangulation edges
  graph = nx.Graph(list(edges))

  #add nodes attributes to the TIN graph from the original points
  original_nodes = point_graph.nodes(data = True)
  for n in xrange(len(original_nodes)):
    XY = original_nodes[n][0] # X and Y tuple - coordinates of the original points
    graph.node[n]['XY'] = XY
    # add other attributes
    original_attributes = original_nodes[n][1]
    for i in original_attributes.iteritems(): # for tuple i = (key, value)
      graph.node[n][i[0]] = i[1]


  # calculate Euclidian length of edges and write it as edges attribute
  if calculate_distance:
    edges = graph.edges()
    for i in xrange(len(edges)):
      edge = edges[i]
      node_1 = edge[0]
      node_2 = edge[1]
      x1, y1 = graph.node[node_1]['XY']
      x2, y2 = graph.node[node_2]['XY']
      dist = sqrt( pow( (x2 - x1), 2 ) + pow( (y2 - y1), 2 ) )
      dist = round(dist, 2)
      graph.edge[node_1][node_2]['distance'] = dist


  # plot graph
  if show:
    pointIDXY = dict(zip(range(len(point_graph)), point_graph))
    nx.draw(graph, pointIDXY)
    plt.show()

  return graph

Wednesday, October 1, 2014

Interactive Visualisation of the Profitable Amount of Waste to Dispose Illegally

"Wow!" - I said to myself after reading R Helps With Employee Churn post - "I can create interactive plots in R?!!! I have to try it out!"

I quickly came up with an idea of creating interactive plot for my simple model for assessment of the profitable ratio between the volume waste that could be illegally disposed and costs of illegal disposal [Ryabov Y. (2013) Rationale of mechanisms for the land protection from illegal dumping (an example from the St.-Petersburg and Leningrad region). Regional Researches. №1 (39), p. 49-56]. The conditions for profitable illegal dumping can be describes as follows:



Here: k - the probability of being fined for illegal disposal of waste;
P - maximum fine for illegal disposal of waste (illegal dumping);
V - volume of waste to be [illegally] disposed by the waste owner;
E - costs of illegal disposal of waste per unit;
T - official tax for waste disposal per unit.

The conditions for the profitable landfilling can be described as follows:

Here: V1 - total volume of waste that is supposed to be disposed at illegal landfill;
Tc - tax for disposal of waste at illegal landfill per unit;
P1 - maximum fine for illegal landfilling;
E1 - expenditures of the illegal landfill owner for disposal of waste per unit.

Lets plot the graphs (with some random numbers (except for fines) for a nice looking representation) to have a clue how it looks like.


Note that there is a footnote (this post provides nice examples on how to do it) with the values used for plotting - it is important to have to have this kind of indication if we want to create a series of plots.

Now I will show you the result and then will provide the code and some tips.

Playing with the plot

Tips and Tricks

Before I will show you code I want to share my hardly earned knowledge about nuances of the manipulate library. There are several ways to get static plot like that using ggplot, but some of them will fail to be interactive with manipulate.

  1. All the data for the plot must be stored in one dataframe.
  2. All data for plots must be derived from the dataframe (avoid passing single variables to ggplot).
  3. Do not use geom_hline() for the horizontal line - generate values for this line and store them inside dataframe and draw as a regular graph.
  4. To create a footnote (to know exactly which parameters were used for the current graph) use arrangeGrob() function from the gridExtra library.
  5. Always use $ inside aes() settings to address columns of your dataframe if you want plots to be interactive

The Code

library(ggplot2)
library(grid)
library(gridExtra)
library(manipulate)
library(scales)
library(reshape2)

## Ta --- official tax for waste utilisation per tonne or cubic metre.
## k --- probability of getting fined for illegal dumping the waste owner (0 V, y <=> E

max_waste_volume <- 2000 
Illegal_dumping_fine_P <- 300000
Illigal_landfilling_fine_P1 <- 500000
Fine_probability_k <- 0.5
Official_tax_Ta <- 600

# mwv = max_waste_volume
# P = Illegal_dumping_fine_P
# P1 = Illigal_landfilling_fine_P1
# k = Fine_probability_k
# Ta = Official_tax_Ta

updateData <- function(mwv, k, P1, P, Ta){
    
    # creates and(or) updates global data frame to provide data for the plot
    
    new_data <<- NULL
    new_data <<- as.data.frame(0:mwv)
    names(new_data) <<- 'V'
    new_data$IlD <<- k*P1/new_data$V
    new_data$IlD_fill <<- new_data$IlD
    new_data$IlD_fill[new_data$IlD_fill > Ta] <<- NA # we don't want ribbon to fill area above Official tax 
    new_data$IlL <<- Ta-k*P/new_data$V
    
    new_data$Ta <<- Ta
    new_data$zero <<- 0
    dta <<- melt(new_data, id.vars="V", measure.vars=c("IlD", "IlL", "Ta"))
    dta.lower <<- melt(new_data, id.vars="V", measure.vars=c("IlD_fill", "zero", "Ta"))
    dta.upper <<- melt(new_data, id.vars="V", measure.vars=c("Ta", "IlL", "Ta"))
    dta <<- cbind(dta, lower=dta.lower$value, upper=dta.upper$value)
    dta$name <<- factor(NA, levels=c("Illegal landfill owner's\nprofitable ratio",
                                    "Waste owner's\nprofitable ratio", 
                                    "Official tax"))
    dta$name[dta$variable=="IlD"] <<- "Illegal landfill owner's\nprofitable ratio"
    dta$name[dta$variable=="IlL"] <<- "Waste owner's\nprofitable ratio"
    dta$name[dta$variable=="Ta"] <<- "Official tax"
}

updateLabels <- function(k, P1, P, Ta){
    
    ### creates footnote caption for the plot
    
    prob <- paste('Fining probability = ', k, sep = '')
    landfilling_fine <- paste('Illegal landfilling fine = ', P1, sep = '')
    dumping_fine <- paste('Illegal dumping fine = ', P, sep = '')
    tax <- paste('Official tax = ', Ta, sep = '')
    note <<- paste(prob, landfilling_fine, sep = '; ')
    note <<- paste(note, dumping_fine, sep = '; ')
    note <<- paste(note, tax, sep = '; ')
    note
}


plotDumping <- function(mwv, P, P1, k, Ta){
    
    ### this function draws the plot
    
   # initialise plot data
    updateData(mwv, k, P1, P, Ta)
    updateLabels(k, P1, P, Ta)
    
    # draw the plot
    profit <- ggplot(dta, aes(x=dta$V, y=dta$value, ymin=dta$lower, ymax=dta$upper, 
                              color=dta$name, fill=dta$name, linetype=dta$name)) +
        geom_line(size=1.2) + 
        geom_ribbon(alpha=.25, linetype=0) +
        theme(axis.text.x = element_text(angle=0, hjust = 0),
              axis.title = element_text(face = 'bold', size = 14),
              title = element_text(face = 'bold', size = 16),
              legend.position = 'right',
              legend.title = element_blank(),
              legend.text = element_text(size = 12),
              legend.key.width = unit(2, 'cm'),
              legend.key.height = unit(1.2, 'cm'))+
        scale_linetype_manual(values=c(4, 5, 1)) +
        scale_fill_manual(values = c("#F8766D","#00BFC4",NA)) +
        scale_color_manual(values = c("#F8766D","#00BFC4", '#66CD00')) + 
        labs(title="Profitable ratio between the volume \nof illegally disposed waste \nand costs of illegal disposal of waste",
             x="Waste volume, cubic meters",
             y="Cost per cubic meter, RUB") +
        xlim(c(0, max(new_data$V)))+
        ylim(c(0, Ta*1.5))
        
    
    # add a footnote about paramaters used for the current plot
    profit <- arrangeGrob(profit, 
                          sub = textGrob(note, 
                                         x = 0, 
                                         hjust = -0.1, 
                                         vjust=0.1, 
                                         gp = gpar(fontface = "italic", fontsize = 12)))
    
    # show plot
    print(profit)

}


simDumping <- function(max_waste_volume = 2000, 
                       Illegal_dumping_fine_P = 300000,
                       Illigal_landfilling_fine_P1 = 500000,
                       Fine_probability_k = 0.5,
                       Official_tax_Ta = 600) {
    
    ### this function creates interactive plot
    
    max_waste_volume <<- max_waste_volume 
    Illegal_dumping_fine_P <<- Illegal_dumping_fine_P
    Illigal_landfilling_fine_P1 <<- Illigal_landfilling_fine_P1
    Fine_probability_k <<- Fine_probability_k
    Official_tax_Ta <<- Official_tax_Ta
    
    manipulate(suppressWarnings(plotDumping(max_waste_volume, 
                           Illegal_dumping_fine_P,
                           Illigal_landfilling_fine_P1,
                           Fining_probability_k,
                           Official_tax_Ta)
                           ),
               
               # set up sliders
               max_waste_volume = slider(0, 50000, 
                                         initial = max_waste_volume, 
                                         step = 100,
                                         label = 'X axis range'),
               Illegal_dumping_fine_P = slider(0, 5000000, 
                                               initial = Illegal_dumping_fine_P,
                                               step = 10000, 
                                               label = 'Illegal dumping fine (P)'),
               Illigal_landfilling_fine_P1 = slider(0, 5000000, 
                                                    initial = Illigal_landfilling_fine_P1, 
                                                    step = 10000,
                                                    label = 'Illegal landfilling fine (P1)'),
               Fining_probability_k = slider(0, 1, 
                                             initial = 0.5,
                                             step = 0.01,
                                             label = 'Probability of being fined (k)'),
               Official_tax_Ta = slider(0, 3000, 
                                        initial = Official_tax_Ta, 
                                        step = 50,
                                        label = 'Official waste disposal tax (T)')
    )
}

simDumping() # for reasons unknown I have to run this function twice to get proper interactive plot

Saturday, September 13, 2014

The Most Hilarious Rendering "Bug" I've Ever Seen

Can you guess the projection that I used to display a world map at the picture below (EPSG code is left there because I doubt someone would recall it anyway)?


It is the World Bonne projection in QGIS (original shp's CRS is EPSG:4326). I already had some fun with Bonne in this post. The world got torn apart at scale 1:75 000 000. At bigger scales map renders normally:



Fortunately this issue can be easily resolved by disabling "feature simplification by default for newly added layers" under Settings -> Options -> Rendering. 

Monday, September 8, 2014

Schema of the Conservation Areas in Leningradskaya Region: Some Notes for Beginner Mappers

Schema of the Conservation Areas in Leningradskaya Region

About a year ago I was asked to create a small (a B5 size) and simple schema of the conservation areas in Leningradskaya region. I did it using QGIS. Here you are the author version of the schema and several notes that might be helpful for a beginner map-maker.

There was a huge disproportion between areas of different objects and both polygon and point markers were needed to show them. I decided to use Centroid Fill option in polygon style to be able to use only one layer (polygon) instead of two (polygon and point). Using Centroid Fill makes points in centres of the small polygons overlap and mask these tiny polygons. 

All the administrative borers were stored in one layer (and there are far more borders than one see here). They are drawn using rule-based symbology so I didn't even need to subset this layer to get rid of the rest of the polygons in this layer.

All the names of the surrounding countries, regions, city and the water bodies are not labels derived from layers, but labels created inside the map composer. It was quicker and their position was more precise which is crucial for such a small schema.

There was a lack of space for the legend so I had to utilise every bit of canvas to place it. I had to use 3 legend items here. One of them is actually overlapping the other and setting a transparent background for the legends was helping with that.

Finally labels for the conservation areas (numbers) were outlined with white colour to be perfectly readable. Some of them were moved manually (with storing coordinates in special columns of the layer) to prevent overlapping with other labels and data.

P.S. Don't be afraid to argue with the client about the workflow. Initially I was asked it manually digitise a 20 000 x 15 000 pixels raster map of the Leningrad region to extract the most precise borders of the conservation areas (and districts of the region). Of course I could do it and charge more money for the job, but what's the point if some of that borders are not even to be seen at this small schema? So I convinced client to use data from OSM and saved myself a lot of time.

Sunday, August 31, 2014

Donetsk Label at OpenStreetMap

Now, when Donetsk became the hot spot of the civil war at the Ukraine I decided to take a glance at the map of the war zone. Of course I used OSM for this. But despite I knew the approximate location of the city I couldn't find it for several minuets. I was confused... The reason was that "Donetsk" label (label of the city which is the administrative center of the Donetskaya region BTW) at the scales smaller than 10 km is suppressed by the label of adjusted town "Makeevka". WTF?! 

"Donetsk" ("Донецьк") label at OSM is suppressed by the label of adjusted town