Some Questions Regarding Clustering

wufuturawufutura MemberPosts:38Contributor I
HI Everyone! - Hope all are safe, healthy and happy this evening. I have several and "apparently" atypical questions regarding 3 "newer" clustering methods. I wish to use them on polynomial data imported from an Excel spreadsheet with approximately 300 rows, 45 columns and lots and lots of missing values.
1.The Confusion Matrix Cluster- assumeone has a known value of pointsto be clustered that approximate #190 in total. The current techniques have been claimed to me as tending to introduce some bias. This technique claims itself the "gold standard"by combining a "confusion matrix" in combination with a "k-means" cluster.The difference is then"somehow" (emphasize "somehow") computedto yield the important & unbiased & clustered difference. QUESTION(s):
(a) What minimum number of operators, in what order, would I choose in the design window?
(b) What operator would I want to attach to establish to show that I had accomplished my sought after goal on a statistical / performance basis?
2.The Silhouette Coefficient- the use oftwo operators in 4 different ways:(a) K-means operator, another and separate (b) K-means operator (identical kind or no?) (c) average the distances between the results yielded by the clustering that clustered in the points between (a) and (b), and finally (d) assume that the low values are outliers and the high values are well clustered & an "optimal" number. QUESTIONS(S):
(a & b) are these using the exact same K-means operators and how are they minimally arranged in the design view?
(c) is the "averaging" done with the use of some particular operator?
(d) what exact operator(s) determines the statistical output that shows the outlier (low scoring) vs well-clustered (high scoring) differences? How are these diagrammed?
3.The Mutual Interaction Information Cluster- theunspecified measurementof how muchinformation is shared between a clustering operator and a "ground truth" classifier.The relationship is mean to detect "non-linear" similarities that effectively reduced bias in the resulting cluster. QUESTIONS(s):
(a) what is meant by "unspecified measurement" and can it be achieved by use of a RapidMinder operator, and if so, how?
(b) what is meant by a "ground truth" classifier? I am unfamiliar with the term. What would we call it if it's in inventory?
(c) how would we use our operators to both detect and measure "non-linear" similarities?

Please include many, many simple diagrams / screenshots for my simple mind. Thank you and have a great evening. Talk tomorrow, I hope & trust. Richard

Best Answer

  • jacobcybulskijacobcybulski Member, University ProfessorPosts:391Unicorn
    edited September 2020 Solution Accepted
    @wufuturaI am not exactly sure of the rest of the questions, however, if you are getting errors reading in this World Bank Excel file, make sure you deal with the junk lines at the top. So you will need to specify the valid range of cells to read, i.e. A4:AR268, and then the position of a header row, which is 4. It will read it in!
    wufutura

Answers

  • wufuturawufutura MemberPosts:38Contributor I
    ATTENTION!:just discovered that Ingo has done a brief video on this very subject of unbiased clustering & that the operator exists in the Operators area under the title"Agglomerative Clustering." I have three questions:

    1. given that I have now apparently found a suitable in-house Operator for the needed taskwhat are the preferred settingsof the three as currently offered?
    2. what is asimple diagram someone can provide mewith that will allow me to do this kind of clustering without much fuss?
    3. do we have an Operator that willallow me to draw a Lorenz Curvefrom the resultant, newly clustered data-points?
    4.How do I get the "Impute Missing Values" Operator to work, with the proper settings,since it always seems to malfunctions usually offering up the same complaints?
    5.how do I properly load a dataset?I knows this sounds like a stupid question but it's always hit-and-miss-miss for me?
    6. do Ineed any special output evaluative Operator to uselast in line here to make sure that the proper clustering really happened?

    Thanks everyone!Richard
  • wufuturawufutura MemberPosts:38Contributor I
    Agglomerative 聚类问题
    • 好的,有问题但这是睡觉的时候了. in the meantime maybe someone in a different timezone cantake a quick peak at this snip file AND accompanying data file,以及?Question: what is going on?Can some submit to me a simple diagram about how I would properly set up the operators and their proper settings to get what I'm saying I'm looking for?Excel file (to be clustered) attached at bottom...




  • wufuturawufutura MemberPosts:38Contributor I
    Please notice that i sent my entire data fileand just want to know three things:
    1. how do i properlyloadit without getting anerror?
    2. use theAgglomerative ClusteringOperator with proper settings?
    3. asking for a simple, simple snippet of a diagram so i can set this up in the design view.

Sign InorRegisterto comment.