Class URLFill

    • Field Detail

      • log

        private static final org.apache.logging.log4j.Logger log
      • COUNT_URLS_START_VALUE

        private static final int COUNT_URLS_START_VALUE
        Minimum value for -count_urls option. This option is 1-based, where 1 will cause PADRE to return a list of hostname only URLs (e.g. http://exampe.org)
        See Also:
        Constant Field Values
      • COUNT_URLS_INCREMENT

        private static final int COUNT_URLS_INCREMENT

        How much to increment -count_urls to get the data we need.

        This is "2" because given a url http://example.org/folder1/folder2/folder3/pag.html and the current scope http://example.org/folder1/:

        • With 1 we only get http://example.org/folder1/folder2 so folder2. will be considered a "page", not a possible sub-folder
        • With 2 we get http://example.org/folder1/folder2/folder3, allowing us to see that folder2 is actually a folder and include it in the facet values

        See COUNT_URLS_START_VALUE and the implementation details of PADRE -count_urls option.

        See Also:
        Constant Field Values
      • DEFAULT_SEGMENTS

        public static final int DEFAULT_SEGMENTS
        Default number of path segments to count when no facet value is selected
        See Also:
        Constant Field Values
      • TAG

        public static final java.lang.String TAG
        Identifier used in query string parameter.
        See Also:
        Constant Field Values
    • Constructor Detail

      • URLFill

        public URLFill​(java.lang.String url)
    • Method Detail

      • convertUrlCountsToFacetUrlAndCount

        public java.util.Map<FacetURL,​java.lang.Integer> convertUrlCountsToFacetUrlAndCount​(SearchTransaction st)
        Takes the URL counts from padre and convert them into a FacetUrl and count If URLs flatten down to what is the same (e.g. smb://foo/BAR/ is the same as smb://foo/bar/) the counts will be summed. We will still also have preserve the casing of one form which may make the results look nicer.
        Parameters:
        st -
        Returns:
      • getCurrentConstraint

        protected java.util.Optional<java.lang.String> getCurrentConstraint​(SearchQuestion sq)
        Use this to get the single constraint from the list of currentConstraints At this point if the response packet URLs are respected we should never have more than one constraint. Although it might happen from bad input, we take the first
        Parameters:
        currentConstraints -
        Returns:
      • getSelectedItems

        public static java.util.List<FacetURL> getSelectedItems​(java.util.Optional<java.lang.String> currentConstraintOpt,
                                                                FacetURL url)
        By looking at the URL the user sets on the facet definition and the current constraint this will return the selected items that padre may return if the result set has documents under those directories. e.g. if the url is foo.com/1/ and the curentConstraint is 1/bar/foo then this will return foo.com/1/bar and foo.com/1/bar/foo
        Parameters:
        currentConstraint - the current constraint which will be some path under the users URL
        url - the prefix set by the user that all URLs must be undert the constraint is deeper under this e.g. if url is http://foo.com/ and constraint is bar then we would see URLs under http://foo.com/bar/
        Returns:
      • getDepth

        static int getDepth​(FacetURL currentConstraint,
                            FacetURL checkUrl)
        Get the depth of a URL, from the current constraint.
        Parameters:
        currentConstraint - Currently constraint URL, such as smb://server/folder1/folder2/
        checkUrl - URL to check, absolute (Ex: smb://server/folder1/folder2/file3.txt)
        Returns:
        the depth of the check URL compared to the current constraint. Will be negative if the check URL is a parent, zero if the check URL is identical to the contraint, or positive if the check URL is deeper (with the value indicating how deep)
      • workoutValue

        private java.util.Optional<org.apache.commons.lang3.tuple.Pair<java.lang.Integer,​CategoryValueComputedDataHolder>> workoutValue​(FacetURL item,
                                                                                                                                              int count,
                                                                                                                                              FacetURL url,
                                                                                                                                              FacetURL urlWithCurrentConstraint)
                                                                                                                                       throws java.io.UnsupportedEncodingException
        Parameters:
        item - comes from padre's URL counts.
        count -
        url -
        currentConstraint - This should ba a path under the URL
        Returns:
        Throws:
        java.io.UnsupportedEncodingException
      • stripTrailingSlash

        private static java.lang.String stripTrailingSlash​(java.lang.String str)
      • stripLeadingSlash

        private static java.lang.String stripLeadingSlash​(java.lang.String str)
      • stripLeadingAndTrailingSlash

        private static java.lang.String stripLeadingAndTrailingSlash​(java.lang.String str)
      • getQueryStringCategoryExtraPart

        public java.lang.String getQueryStringCategoryExtraPart()
        Gets the extra part of the query string param name e.g. f.|=value.

        Note that this category definition will use a special tag url instead of directly the metadata v on which URLs are mapped internally. For example: f.Category|url instead of f.Category|v.

        v can't be used otherwise we won't be able to distinguish between an URL type category definition and a Metadata field type definition.

        Specified by:
        getQueryStringCategoryExtraPart in class CategoryDefinition
        Returns:
      • matches

        public boolean matches​(java.lang.String value,
                               java.lang.String extraParams)

        Given the value of a query string parameter, and any extra parameters, whether this category types is relevant for this parameter.

        For example: f.By Date|dc.date=2010-01-01:

        • value = 2010-01-01
        • extra = dc.date

        A category of type "metadata fill" for the "dc.date" metadata should return true.

        Specified by:
        matches in class CategoryDefinition
        Parameters:
        value - The value to check for.
        extraParams - The extra parameter to check for.
        Returns:
        true if this category definition matches, false otherwise.
      • setData

        public void setData​(java.lang.String data)

        Specific data for this category type.

        Depending of the actual type, can be a metadata class, a query expression, etc.

        Overrides:
        setData in class CategoryDefinition
      • getSelectedValues

        public java.util.stream.Stream<java.lang.String> getSelectedValues​(SearchQuestion question)
      • getQueryProcessorOptions

        public java.util.List<QueryProcessorOption<?>> getQueryProcessorOptions​(SearchQuestion question)
        Description copied from class: CategoryDefinition

        Get additional query processor options to apply for this category definition.

        That gives the opportunity to the category definition to add additional QPOs that it may need. QPOs may differ depending if the facet is currently selected or not, such as setting -count_urls dynamically depending on the current number of segments in the URL drill down facet

        Specified by:
        getQueryProcessorOptions in class CategoryDefinition
        Parameters:
        question - Can be used to inspect the currently selected facets and return appropriate QPOs
        Returns:
        A list of query processor options
      • fullCurrentConstraintForCounting

        protected java.lang.String fullCurrentConstraintForCounting​(SearchQuestion sq)
        Must return a string which will have the number of segments counted to determine how deep -count_urls needs to be. returned value must include the host.
        Parameters:
        sq -
        Returns:
      • joinConstraintToUserURLPrefix

        FacetURL joinConstraintToUserURLPrefix​(java.lang.String constraint)
        Joins the selected constraint (which is a path under the user URL prefix) to the User url URL prefix. This takes care of slashes and makes some effort to ensure that we don't join with double slashes and we have a trailing slash to ensure the prefix check works.
        Parameters:
        constraint -
        Returns:
      • countSegments

        public static int countSegments​(java.lang.String urlSubstring)

        Counts the number of segments in the path component of a URL substring

      • allValuesDefinedByUser

        public boolean allValuesDefinedByUser()
        Description copied from class: CategoryDefinition
        Tells you if all the CategoryValues this CategoryDefiniton can produce are ones that must be set on the category by the user.

        Values defined by the user are ones like gscopes where values not from the user come from other sources such as metadata.

        Specified by:
        allValuesDefinedByUser in class CategoryDefinition
        Returns:
        true if all values are defined by the user and not generated from the data.