Frequently asked questions

Important: In this context, a "pattern" refers to a musical fragment that recurs throughout the piece or corpus with some degree of similarity (controlled by the `dist_min` threshold and the chosen viewpoints). This definition is computational and focuses on sequence repetition.

It may differ from the traditional musicological understanding of a "motive," which often involves deeper structural or expressive significance.

The algorithm finds recurring *sequences*, which may or may not align perfectly with musically significant motives.

The algorithm identifies all sequences meeting the specified criteria (min_length, max_length, min_support, dist_min). Due to the iterative merging and expansion process based on similarity, it's common to find

  • Overlapping Patterns: A longer pattern might contain slightly shorter patterns that also meet the support threshold. The final pruning step removes patterns fully contained within others, but partial overlaps can remain.
  • Slight Variations: If dist_min is greater than 0.0, the algorithm groups sequences that are similar but not necessarily identical. This can lead to pattern "families" where members have minor differences according to the chosen viewpoints and distance function.
  • Sensitivity to Parameters: The results are highly sensitive to the input parameters. Experimenting with min_length, max_length, min_support, and especially dist_min and the selected viewpoints can significantly alter the output.

This can unfortunately happen sometimes. There are a few common reasons:

  1. Errors within the score file itself: The input file might contain encoding errors or structural inconsistencies that the underlying music21 library cannot parse correctly (e.g., unclosed repeat bars, measure inconsistencies, corrupted notation elements). Ensuring your score is well-formed and validated can help.
  2. Corrupt file: The file itself might be damaged or incomplete, preventing it from being read properly.
  3. Internal preprocessing limitations: The steps taken to convert the raw score into the viewpoint representations might encounter musical structures or notation conventions they weren't designed to handle. Music notation is incredibly broad and complex; even the preprocessing step, designed to handle common cases, may fail when faced with particularly unusual or complex score particularities.

Trying to load and parse the file directly with `music21` might give more specific error messages. Using a cleaner or simpler version of the score, if possible, might also resolve the issue.

This Proof-of-Concept (PoC) directly accepts only a limited set of symbolic music formats that can be processed by the music21 library (MusicXML, ABCNotation, MIDI, Kern are the most tested).

  • If your score is in another symbolic format (like Sibelius .sib, Finale .musx, Dorico .dorico, Guitar Pro .gp*, etc.): You can often use the original software or a dedicated editor like MuseScore (which is free and supports many formats) to export it to MusicXML (.xml or .mxl). MusicXML is generally the recommended format as it tends to preserve the most detailed notation information accurately for analysis.
  • If your score is in a non-symbolic format like a PDF or an image file (.png, .jpg, etc.): You first need to convert it into a symbolic format. This requires Optical Music Recognition (OMR) software (e.g., Audiveris (free), Photoscore, SmartScore). These tools "read" the image and attempt to generate a symbolic file (often MusicXML). Be aware that OMR results usually require manual proofreading and correction for optimal accuracy before being used as input for pattern analysis.

Therefore, whenever possible, obtain or convert your score to MusicXML for best results with this tool.

The identified patterns serve as structured data for both musicological and computational exploration:

  1. Musicological Interpretation:
    • Researchers examine the patterns' musical relevance (are they motives, themes, common figures?).
    • They compare pattern usage across different pieces, composers, genres, or periods to find stylistic markers.
    • Variations within pattern families are studied to understand musical development.
  2. Computational & Statistical Analysis:
    • Pattern Profiles: Analyze the frequency and distribution of patterns within pieces or sections.
    • Hypothesis Testing: Statistically test ideas, such as whether certain patterns are more common in specific genres or composers.
    • Feature Extraction: Use techniques like TF-IDF to identify patterns that are particularly distinctive of a piece or group within a larger collection.
    • Machine Learning: Train models using pattern information (e.g., counts, TF-IDF scores) to classify pieces by genre, composer, or other attributes.
    • Clustering: Group pieces based on the similarity of their overall pattern content to reveal stylistic groupings.
    • Relational Analysis: Investigate how patterns relate to each other (e.g., common sequences or overlaps).

In essence, the pattern discovery output fuels deeper qualitative interpretation and enables various quantitative methods to analyze musical style, structure, and evolution.

This repository offers a simplified proof-of-concept (PoC) showcasing a core subset of functionalities. The full, proprietary `SimPG-Music` algorithm, developed by Enrique Gutiérrez Álvarez, is significantly more comprehensive and powerful. Key differences include:

  • Corpus-Level Analysis: The full algorithm is designed to efficiently analyze entire collections (corpora) of musical pieces, not just single files, enabling broader comparative studies.
  • Advanced Configuration & Aggregation: It features an optimized workflow allowing users to run multiple parameter configurations (e.g., varying min_length, min_support, dist_min) simultaneously. Results can then be aggregated, making it easier to capture patterns relevant at different structural scales or under varying similarity constraints (e.g., requiring higher support for shorter patterns versus longer ones).
  • Expanded Viewpoint Library: Access to a wider and potentially more nuanced range of musical feature representations ("viewpoints") beyond those included in this PoC.
  • Enhanced Scalability & Performance: Significant optimizations for handling larger datasets (corpora with many pieces or very long individual works) much more efficiently than the PoC.
  • Sophisticated Distance Metrics: May include more advanced algorithms for measuring similarity between musical sequences, potentially tailored to specific musical features.
  • Additional Analytical Features: Capabilities extending beyond basic pattern finding, potentially including pattern classification, hierarchical pattern analysis, advanced visualization options, or integration with other music analysis frameworks.

In essence, this PoC demonstrates the *fundamental concept* of viewpoint-based similarity pattern discovery but lacks the robustness, scalability, configuration flexibility, and advanced features of the complete proprietary system.

While feedback and suggestions are welcome, please understand that this repository is a limited PoC. Development priority is focused on the main proprietary algorithm. Adding significant new features to this *public* PoC is unlikely unless it aligns with broader demonstration goals.

If you require specific functionalities:

  1. Check the Full Algorithm: The feature might already exist in the proprietary version.
  2. Discuss Custom Solutions: Depending on your needs, a custom solution or collaboration might be possible.

In either case, please contact the developer (Enrique Gutiérrez Álvarez) to discuss your requirements.

The full SimPG-Music algorithm is proprietary intellectual property and is not open source. Access is typically granted through:

  1. Collaboration: Engaging in a research or development project with the creator.
  2. Licensing: Arranging a specific licensing agreement for academic or potentially commercial use.

Access is determined on a case-by-case basis. Please contact Enrique Gutiérrez Álvarez (enrique.gutierrez1990@gmail.com) directly to inquire about possibilities, detailing your intended use case.

Yes. If you require assistance beyond the scope of this documentation, need help tailoring the approach to your specific research questions, require support integrating this tool (or concepts from it) into a larger workflow, or are interested in custom analyses, paid support and consulting services are available.

This can range from help with parameter tuning and result interpretation for the PoC to discussions about leveraging the full algorithm's capabilities for your project. Rates vary based on the complexity and scope of the required support. Academic collaborations might also be considered under different terms.

Please contact Enrique Gutiérrez Álvarez (enrique.gutierrez1990@gmail.com) with details about your needs to discuss options and receive a quote.

Please report bugs directly via email to the developer: Enrique Gutiérrez Álvarez enrique.gutierrez1990@gmail.com.

To help fix the issue quickly, please include the following in your email:

Subject: [SimPG-Music PoC Bug] Brief Description

  1. Bug Description: What went wrong?
  2. Command Used: The exact python run.py ... command you ran.
  3. Expected vs. Actual Outcome: What should have happened vs. what did happen (include error messages/tracebacks).
  4. Minimal Reproducible Example:
    • Attach the smallest possible input music file that shows the bug.
    • Confirm the command used for *this specific file.

A minimal reproducible example is crucial for debugging. Thank you!