C1TextParser

Overview

Overview

This view shows basic features of StartsUntilContinuesAfterExtractor.

Features

  • Sample Applications

  • Starts After Continues Until Extractor

    The Starts-After-Continues-Until extractor is the simplest and the easiest to use. This extractor was designed with the purpose of extracting relevant text from a plain text source. To use it you must define two parameters: where the text starts and where it ends (or continues until). Essentially, it extracts all the text contained between the occurrences of two regular expressions.

    Input file

    Foxes are small-to-medium-sized, omnivorous mammals belonging to several genera of the family Canidae. Foxes have a flattened skull, upright triangular ears, a pointed, slightly upturned snout, and a long bushy tail (or brush). Twelve species belong to the monophyletic "true foxes" group of genus Vulpes. Approximately another 25 current or extinct species are always or sometimes called foxes; these foxes are either part of the paraphyletic group of the South American foxes, or of the outlying group, which consists of bat-eared fox, gray fox, and island fox.[1] Foxes live on every continent except Antarctica. By far the most common and widespread species of fox is the red fox (Vulpes vulpes) with about 47 recognized subspecies.[2] The global distribution of foxes, together with their widespread reputation for cunning, has contributed to their prominence in popular culture and folklore in many societies around the world. The hunting of foxes with packs of hounds, long an established pursuit in Europe, especially in the British Isles, was exported by European settlers to various parts of the New World.

    Extracted result

    {
      "Extractor": "StartsAfterContinuesUntil",
      "Result": [
      {
        "StartIndex": 5,
        "ExtractedText": " are small-to-medium-sized, omnivorous mammals belonging to several genera of the family Canidae. Foxes have a flattened skull, upright triangular ears, a pointed, slightly upturned snout, and a long bushy tail (or brush). Twelve species belong to the monophyletic \"true foxes\" group of genus Vulpes. Approximately another "
      },
      {
        "StartIndex": 336,
        "ExtractedText": "nt or extinct species are always or sometimes called foxes; these foxes are either part of the paraphyletic group of the South American foxes, or of the outlying group, which consists of bat-eared fox, gray fox, and island fox.["
      },
      {
        "StartIndex": 572,
        "ExtractedText": " live on every continent except Antarctica. By far the most common and widespread species of fox is the red fox (Vulpes vulpes) with about "
      },
      {
        "StartIndex": 719,
        "ExtractedText": "nized subspecies.["
      }
    ]
    }
    using System.Collections;
    using System.Globalization;
    using System.Linq;
    using System.Web.Mvc;
    using C1.Web.Mvc;
    using SamplesExplorer.Models;
    using System.Collections.Generic;
    using System;
    using C1.TextParser;
    using System.IO;
    using System.Text;
    
    namespace SamplesExplorer.Controllers
    {
        public partial class C1TextParserController : Controller
        {
            public ActionResult StartsAfterContinuesUntilExtractor(FormCollection collection)
            {
                StartsAfterContinuesUntil startsAfterContinuesUntil = new StartsAfterContinuesUntil(@"[a-zA-Z]{5}", @"([+-])?[0-9]+");
    
                using (var inputStream = System.IO.File.Open(Server.MapPath("~/Content/sampleFiles/input.txt"), FileMode.Open))
                {
                    IExtractionResult result = startsAfterContinuesUntil.Extract(inputStream);
                    ViewBag.ExtractionResult = result.ToJsonString();
                }
                return View();
            }
        }
    }
    
    @section Summary{
        <p>@Html.Raw(Resources.C1TextParser.StartsAfterExtractor_Text0)</p>
    }
    
    <div>
        <div>
            <h3>@Html.Raw(Resources.C1TextParser.StartsAfterExtractor_Title)</h3>
    
            <p>@Html.Raw(Resources.C1TextParser.StartsAfterExtractor_Text1)</p>
        </div>
        <div>
            <h3>Input file</h3>
            <pre class="scrollable-pre">@Html.Raw(ControlPages.GetSampleFileContent("input.txt"))</pre>
        </div>
        <div>
    
            <h3>Extracted result</h3>
            <pre class="scrollable-pre">@Html.Raw(ViewBag.ExtractionResult)</pre>
        </div>
    </div>